emnlp emnlp2010 emnlp2010-4 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dave Golland ; Percy Liang ; Dan Klein
Abstract: Language is sensitive to both semantic and pragmatic effects. To capture both effects, we model language use as a cooperative game between two players: a speaker, who generates an utterance, and a listener, who responds with an action. Specifically, we consider the task of generating spatial references to objects, wherein the listener must accurately identify an object described by the speaker. We show that a speaker model that acts optimally with respect to an explicit, embedded listener model substantially outperforms one that is trained to directly generate spatial descriptions.
Reference: text
sentIndex sentText sentNum sentScore
1 Specifically, we consider the task of generating spatial references to objects, wherein the listener must accurately identify an object described by the speaker. [sent-9, score-0.866]
2 We show that a speaker model that acts optimally with respect to an explicit, embedded listener model substantially outperforms one that is trained to directly generate spatial descriptions. [sent-10, score-1.114]
3 1 Introduction Language is about successful communication between a speaker and a listener. [sent-11, score-0.425]
4 The speaker’s goal is to reference the target object O1 by describing its spatial relationship to other object(s). [sent-16, score-0.325]
5 Indeed, although both utterances (a) and (b) are semantically valid, only (b) is pragmatically felicitous: (a) is ambiguous and therefore violates the Gricean maxim of manner (Grice, 1975). [sent-21, score-0.312]
6 We present our pragmatic model in a grounded setting where a speaker must describe a target object to a listener via spatial description (such as in the example given above). [sent-36, score-1.318]
7 2 Language as a Game To model Grice’s cooperative principle (Grice, 1975), we formulate the interaction between a speaker S and a listener L as a cooperative game, that is, one in which S and L share the same utility function. [sent-38, score-1.083]
8 For simplicity, we focus on the production and interpretation of single utterances, where the speaker and listener have access to a shared context. [sent-39, score-0.97]
9 A target, o, is given to the speaker that generates an utterance w. [sent-42, score-0.518]
10 Based on this utterance, the listener generates a guess g. [sent-43, score-0.696]
11 If o = g, then both the listener and speaker get a utility of 1, otherwise they get a utility of 0. [sent-44, score-1.113]
12 This communication game is described graphi- on1 O3 nea0r O3 right0 of O2 Figure 3: Three instances of the communication game on the scenario in Figure 1. [sent-45, score-0.311]
13 For each instance, the target o, utterance w, guess g, and the resulting utility U are shown in their respective positions. [sent-46, score-0.389]
14 Grice’s maxim of manner encourages utterances to be unambiguous, which motivates the following utility, which we call (communicative) success: U(o, g) d=ef I[o = g], (1) where the indicator function I[o = g] is 1 if o = g and 0 otherwise. [sent-50, score-0.307]
15 Hence, a utility-maximizing speaker will attempt to produce unambiguous utterances because they increase the probability that the listener will correctly guess the target. [sent-51, score-1.299]
16 Given a speaker strategy pS (w | o), a listener strategy pL(g | w), and a prior d(wistri |bu ot)io,n a over tnaer-r gets p(o), t(hge expected utility orb dtisaitrniebdu by S oavnedr L risas follows: EU(S,L) = Xp(o)pS(w|o)pL(g|w)U(o,g) oX,w X,g = Xp(o)pS(w|o)pL(o|w). [sent-52, score-1.075]
17 Xo,w (2) 3 From Reflex Speaker to Rational Speaker Having formalized the language game, we now explore various speaker and listener strategies. [sent-53, score-0.949]
18 A literal speaker (denoted S :LITERAL) chooses uniformly from the set of utterances consistent with a target object, i. [sent-55, score-0.978]
19 , the ones which are semantically valid;1 a literal listener (denoted L:LITERAL) guesses an object consistent with the utterance uniformly at random. [sent-57, score-1.256]
20 In the running example (Figure 1), where the target object is O1, there are two semantically valid utterances: (a) right of O2 (b) on O3 S :LITERAL selects (a) or (b) each with probability 21. [sent-58, score-0.262]
21 If S :LITERAL chooses (a), L:LITERAL will guess the target object O1 correctly with probability 21; if S :LITERAL chooses (b), L:LITERAL will guess correctly with probability 1. [sent-59, score-0.46]
22 We say S :LITERAL is an example of a reflex speaker because it chooses an utterance without taking the listener into account. [sent-61, score-1.336]
23 A general reflex speaker is depicted in Figure 4(a), where each edge represents a potential utterance. [sent-62, score-0.528]
24 We call the resulting speaker S(L) the rat(iogna |l speaker w caithll respect to listener L. [sent-65, score-1.296]
25 412 S(L) (a) Reflex speaker (b) Rational speaker Figure 4: (a) A reflex speaker (S) directly selects an utterance based only on the target object. [sent-70, score-1.435]
26 (b) A rational speaker (S(L)) selects an utterance based on an embedded model of the listener (L). [sent-72, score-1.428]
27 Each edge in the first layer represents a different choice the speaker can make, and each edge in the second layer represents a response of the listener. [sent-73, score-0.389]
28 Intuitively, S(L) chooses an utterance, w∗, such that, if listener L were to interpret w∗, the probability of L guessing the target would be maximized. [sent-74, score-0.713]
29 2 The rational speaker is depicted in Figure 4(b), where, as before, each edge at the first level represents a possible choice for the speaker, but there is now a second layer representing the response of the listener. [sent-75, score-0.615]
30 To see how an embedded model of the listener improves communication, again consider our running example in Figure 1. [sent-76, score-0.663]
31 A speaker can describe the target object O1 using either w1 = on O3 or w2 = right of O2 . [sent-77, score-0.584]
32 Suppose the embedded listener is L:LITERAL, which chooses uniformly from the set of objects consistent with the given utterance. [sent-78, score-0.797]
33 The rational speaker S(L:LITERAL) |w owuld therefore choose w1, achieving a utility of 1, which is an improvement over the reflex speaker S:LITERAL’s utility of 21 43. [sent-80, score-1.286]
34 This section focuses on an orthogonal direction: improving literal strategies with learning. [sent-83, score-0.298]
35 These learned strategies can then be used to construct reflex and rational speaker variants—S :LEARNED and S(L:LEARNED), respectively. [sent-85, score-0.822]
36 1 Training a Log-Linear Speaker/Listener We train the speaker, S :LEARNED, (similarly, listener, L:LEARNED) on training examples which comprise the utterances produced by the human annotators (see Section 6. [sent-87, score-0.287]
37 For now, an utterance w consists of two parts: • A spatial preposition w. [sent-92, score-0.275]
38 Both S:LEARNED and L:LEARNED are parametrized by log-linear models: pS:LEARNED(w|o;θS) pL:LEARNED(g|w;θL) exp{θS>φ(o,w)} ∝ exp{θL>φ(g,w)} ∝ (4) (5) where φ(·, ·) is the feature vector (see below), θS awnhde θL are ,t·h)e parameter vectors oforr speaker oawnd), ,li θstener. [sent-103, score-0.347]
39 Note that the speaker and listener use the same 3We chose 10 prepositions commonly used by people to describe objects in a preliminary data gathering experiment. [sent-104, score-1.073]
40 Furthermore, the first normalization sums over possible utterances w while the second normalization sums over possible objects g in the scene. [sent-107, score-0.355]
41 The following are functions of the camera, target (or guessed object) o, and the reference object w. [sent-112, score-0.29]
42 Figure jecting erence camera 5 5: The projection features are computed by proa vector v extending from the center of the refobject to the center of the target object onto the axes fx and fy. [sent-133, score-0.288]
43 Handling Complex Utterances So far, we have only considered speakers and listeners that deal with utterances consisting of one preposition and one reference object. [sent-134, score-0.408]
44 For example, in Figure 1, right of O2 could refer to the vase or the table, but using the conjunction right of O2 and on O3 narrows down the target object to just the vase. [sent-138, score-0.303]
45 We had only considere→d utterances of complexity 1 in previous sections. [sent-143, score-0.291]
46 1 Example Utterances To illustrate the types of utterances available under the grammar, again consider the scene in Figure 1. [sent-145, score-0.297]
47 Utterances of complexity 2 can be generated either using the relativization rule exclusively, or both the conjunction and relativization rules. [sent-146, score-0.258]
48 This is to help a human listener interpret an utterance. [sent-148, score-0.667]
49 2 Extending the Rational Speaker Suppose we have a rational speaker S(L) defined in terms of an embedded listener L which operates over utterances of complexity 1. [sent-150, score-1.548]
50 We first extend L to interpret arbitrary utterances of our grammar. [sent-151, score-0.29]
51 The rational speaker (defined in (2)) automatically inherits this extension. [sent-152, score-0.594]
52 Compositional semantics allows us to define the interpretation of complex utterances in terms of simpler ones. [sent-153, score-0.33]
53 As a base case for interpreting utterances of complexity 1, we can use either L:LITERAL or L:LEARNED (defined in Sections 3 and 4). [sent-156, score-0.317]
54 ) w (06)k Figure 6 shows an example of this bottomup denotation computation for the utterance on something right of O2 with respect to the scene in Figure 1. [sent-163, score-0.332]
55 415 Figure 6: The listener model maps an utterance to a distribution over objects in the room. [sent-170, score-0.872]
56 Generation strategy pL(g So far, we have defined the listener | w). [sent-172, score-0.624]
57 Given target o, the rational speaker S(L) w |ith w respect to this listener needs to compute argmaxw pL (o | w) as dictated by (3). [sent-173, score-1.238]
58 3 Modeling Listener Confusion One shortcoming of the previous approach for extending a listener is that it falsely assumes that a listener can reliably interpret a simple utterance just as well as it can a complex utterance. [sent-177, score-1.442]
59 We now describe a more realistic speaker which is robust to listener confusion. [sent-178, score-0.949]
60 When presented with an utterance w, for each application of the relativization rule, we have a 1−α probability of losing ftiovciuzsat. [sent-182, score-0.267]
61 We then use (3) to define the rational speaker S(L) . [sent-185, score-0.594]
62 − (7) As α → 0, the confused listener is more likely to maAkes a r→an d0o,m th guess, saendd itshtuens ethr eirse m iso a stronger penalty against using more complex utterances. [sent-187, score-0.657]
63 As α → 1, the confused listener converges to pL and the penalty fthore using complex eurtt ceornavnceergs vsa tnois phes. [sent-188, score-0.657]
64 4 The Taboo Setting Notice that the rational speaker as defined so far does not make full use of our grammar. [sent-190, score-0.594]
65 Specifically, the rational speaker will never use the “wildcard” noun something nor the relativization rule in the grammar because an NP headed by the wildcard something can always be replaced by the object ID to obtain a higher utility. [sent-191, score-0.974]
66 Since the tabooed objects cannot be referenced directly, a speaker must resort to use of the wildcard something and relativization. [sent-196, score-0.552]
67 This prevents the speaker from referring directly to O3, so the speaker is forced to describe O3 via the relativization rule, for example, producing something right of O2 . [sent-198, score-0.869]
68 6 Experiments We now present our empirical results, showing that rational speakers, who have embedded models oflis416 Figure 8: Mechanical Turk speaker task: Given the target object (e. [sent-200, score-0.857]
69 , O1), a human speaker must choose an utterance to describe the object (e. [sent-202, score-0.709]
70 For each object o in a scene, we create a scenario, which represents an instance of the communication game with o as the target object. [sent-208, score-0.344]
71 They are prompted with a target object o and asked to each produce an utterance w (by selecting a preposition w. [sent-211, score-0.395]
72 o) that best informs a listener ofthe identity of the target object. [sent-213, score-0.644]
73 For each training scenario o, we asked three speakers to produce an utterance w. [sent-214, score-0.317]
74 The three resulting (o, w) pairs are used to train the learned reflex speaker (S :LITERAL). [sent-215, score-0.575]
75 These pairs were also used to train the learned reflex listener (L:LITERAL), where the target o is treated as the guessed object. [sent-216, score-0.918]
76 Given an utterance generated by a speaker (human or not), the human listener must Question: What object is right of O 2 ? [sent-220, score-1.346]
77 O2O1O3 Figure 9: Mechanical Turk listener task: a human listener is prompted with an utterance generated by a speaker (e. [sent-221, score-1.753]
78 guess the target object that the speaker saw by clicking on an object. [sent-224, score-0.667]
79 The purpose of the listener task is to evaluate speakers, as described in the next section. [sent-225, score-0.602]
80 2 Evaluation Utility (Communicative Success) We primarily evaluate a speaker by its ability to communicate successfully with a human listener. [sent-227, score-0.378]
81 iFsotrri example, if two of the three listeners guessed O1, then pL:HUMAN(O1 | w) = The expected utility (2) is then computed by averaging the utility (communicative success) over the test scenarios TS: 32. [sent-230, score-0.267]
82 Exact Match As a secondary evaluation metric, we also measure the ability of our speaker to exactly match an utterance produced by a human speaker. [sent-232, score-0.571]
83 We asked three human speakers to each produce an utterance w given a target o. [sent-234, score-0.363]
84 910ua543ntc% ihca- tive success and exact match, where only utterances of complexity 1 are allowed. [sent-238, score-0.425]
85 The rational speakers (with respect to both the literal listener L:LITERAL and the learned listener L:LEARNED) perform better than their reflex counterparts. [sent-239, score-2.074]
86 While the human speaker (composed of three people) has higher exact match (it is better at mimicking itself), the rational speaker S(L:LEARNED) actually achieves higher communicative success than the human listener. [sent-240, score-1.231]
87 define the exact match of a speaker S as follows: MATCH(S) =|T1S|XTSXwpS:HUMAN(w | o)pS(w | o). [sent-241, score-0.396]
88 3 Reflex versus Rational Speakers We first evaluate speakers in the setting where only utterances of complexity 1 are allowed. [sent-243, score-0.388]
89 First, our main result is that the two rational speakers S(L:LITERAL) and S(L:LEARNED), which each model a listener explicitly, perform significantly better than the corresponding reflex speakers, both in terms of success and exact match. [sent-245, score-1.261]
90 Second, it is natural that the speakers that involve learning (S :LITERAL and S(L:LITERAL)) outperform the speakers that only consider the literal meaning of utterances (S:LEARNED and S(L:LEARNED)), as the former models capture subtler preferences using features. [sent-246, score-0.748]
91 At the same time, the success rates for all speakers are rather low, reflecting the fundamental difficulty of the setting: sometimes it is impossible to unambiguously evoke the target object via short utterances. [sent-259, score-0.456]
92 4 Generating More Complex Utterances We now evaluate the rational speaker S(L:LEARNED) when it is allowed to generate utterances of complexity 1 or 2. [sent-262, score-0.885]
93 3 that the speaker depends on a focus parameter α, which governs the embedded listener’s ability to interpret the utterance. [sent-264, score-0.442]
94 When α is small, the embedded listener is confused more easily by more complex utterances; therefore the speaker tends to choose mostly utterances of complexity 1. [sent-267, score-1.356]
95 As α increases, the utterances increase in complexity, as does the success rate. [sent-268, score-0.363]
96 However, when α approaches 1, the utterances are too complex and the success rate decreases. [sent-269, score-0.396]
97 Table 2 shows the success rates on TSFINAL for α → 0 (all utterances have complexity 1), α = 1(all utterances h uattveer complexity 2), apnledx α t1u )n,e αd t=o maximize the success rate based on TSDEV. [sent-271, score-0.796]
98 o87∗f910the rational speaker S(L:LEARNED) for various values of α across different taboo amounts. [sent-285, score-0.654]
99 7 Conclusion Starting with the view that the purpose of language is successful communication, we developed a gametheoretic model in which a rational speaker generates utterances by explicitly taking the listener into account. [sent-288, score-1.474]
100 On the task of generating spatial descriptions, we showed the rational speaker substantially outperforms a baseline reflex speaker that does not have an embedded model. [sent-289, score-1.287]
wordName wordTfidf (topN-words)
[('listener', 0.602), ('speaker', 0.347), ('literal', 0.298), ('utterances', 0.256), ('rational', 0.247), ('reflex', 0.181), ('utterance', 0.171), ('object', 0.16), ('pl', 0.123), ('success', 0.107), ('spatial', 0.104), ('objects', 0.099), ('speakers', 0.097), ('relativization', 0.096), ('guess', 0.094), ('game', 0.086), ('utility', 0.082), ('communicative', 0.072), ('embedded', 0.061), ('taboo', 0.06), ('ps', 0.057), ('communication', 0.056), ('rp', 0.048), ('grice', 0.048), ('tellex', 0.048), ('learned', 0.047), ('camera', 0.046), ('guessed', 0.046), ('something', 0.044), ('target', 0.042), ('denotation', 0.041), ('scene', 0.041), ('pragmatics', 0.04), ('bounding', 0.037), ('np', 0.036), ('listeners', 0.036), ('tabooing', 0.036), ('wildcard', 0.036), ('right', 0.035), ('berkeley', 0.035), ('chooses', 0.035), ('complexity', 0.035), ('grounded', 0.034), ('roy', 0.034), ('interpret', 0.034), ('complex', 0.033), ('conjunction', 0.031), ('human', 0.031), ('maxim', 0.031), ('pragmatic', 0.029), ('recursively', 0.029), ('zettlemoyer', 0.028), ('scenario', 0.027), ('exact', 0.027), ('rooted', 0.026), ('cooperative', 0.026), ('interpreting', 0.026), ('referenced', 0.026), ('turk', 0.026), ('uc', 0.026), ('unambiguously', 0.026), ('prepositions', 0.025), ('semantically', 0.025), ('clicking', 0.024), ('deb', 0.024), ('devault', 0.024), ('evoke', 0.024), ('fleischman', 0.024), ('gricean', 0.024), ('kollar', 0.024), ('landau', 0.024), ('piantadosi', 0.024), ('prnd', 0.024), ('regier', 0.024), ('sketchup', 0.024), ('tsfinal', 0.024), ('grounding', 0.024), ('mechanical', 0.024), ('functions', 0.023), ('match', 0.022), ('asked', 0.022), ('successful', 0.022), ('strategy', 0.022), ('confused', 0.022), ('eu', 0.022), ('interpretation', 0.021), ('scenarios', 0.021), ('advancement', 0.021), ('axes', 0.021), ('brain', 0.021), ('feldman', 0.021), ('gorniak', 0.021), ('instructions', 0.021), ('descriptions', 0.021), ('layer', 0.021), ('semantics', 0.02), ('compositional', 0.02), ('indicator', 0.02), ('reference', 0.019), ('projection', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 4 emnlp-2010-A Game-Theoretic Approach to Generating Spatial Descriptions
Author: Dave Golland ; Percy Liang ; Dan Klein
Abstract: Language is sensitive to both semantic and pragmatic effects. To capture both effects, we model language use as a cooperative game between two players: a speaker, who generates an utterance, and a listener, who responds with an action. Specifically, we consider the task of generating spatial references to objects, wherein the listener must accurately identify an object described by the speaker. We show that a speaker model that acts optimally with respect to an explicit, embedded listener model substantially outperforms one that is trained to directly generate spatial descriptions.
2 0.10162832 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats
Author: Su Nam Kim ; Lawrence Cavedon ; Timothy Baldwin
Abstract: We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an increasingly popular means of providing customer service. In particular, we investigate the effectiveness of various features and machine learners for this task. While a simple bag-of-words approach provides a solid baseline, we find that adding information from dialogue structure and inter-utterance dependency provides some increase in performance; learners that account for sequential dependencies (CRFs) show the best performance. We report our results from testing using a corpus of chat dialogues derived from online shopping customer-feedback data.
3 0.067172319 53 emnlp-2010-Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue
Author: Zahar Prasov ; Joyce Y. Chai
Abstract: In situated dialogue humans often utter linguistic expressions that refer to extralinguistic entities in the environment. Correctly resolving these references is critical yet challenging for artificial agents partly due to their limited speech recognition and language understanding capabilities. Motivated by psycholinguistic studies demonstrating a tight link between language production and human eye gaze, we have developed approaches that integrate naturally occurring human eye gaze with speech recognition hypotheses to resolve exophoric references in situated dialogue in a virtual world. In addition to incorporating eye gaze with the best recognized spoken hypothesis, we developed an algorithm to also handle multiple hypotheses modeled as word confusion networks. Our empirical results demonstrate that incorporating eye gaze with recognition hypotheses consistently outperforms the results obtained from processing recognition hypotheses alone. Incorporating eye gaze with word confusion networks further improves performance.
4 0.058023632 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields
Author: Wei Lu ; Hwee Tou Ng
Abstract: This paper focuses on the task of inserting punctuation symbols into transcribed conversational speech texts, without relying on prosodic cues. We investigate limitations associated with previous methods, and propose a novel approach based on dynamic conditional random fields. Different from previous work, our proposed approach is designed to jointly perform both sentence boundary and sentence type prediction, and punctuation prediction on speech utterances. We performed evaluations on a transcribed conversational speech domain consisting of both English and Chinese texts. Empirical results show that our method outperforms an approach based on linear-chain conditional random fields and other previous approaches.
5 0.054107554 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation
Author: Chen Zhang ; Joyce Chai
Abstract: While a significant amount of research has been devoted to textual entailment, automated entailment from conversational scripts has received less attention. To address this limitation, this paper investigates the problem of conversation entailment: automated inference of hypotheses from conversation scripts. We examine two levels of semantic representations: a basic representation based on syntactic parsing from conversation utterances and an augmented representation taking into consideration of conversation structures. For each of these levels, we further explore two ways of capturing long distance relations between language constituents: implicit modeling based on the length of distance and explicit modeling based on actual patterns of relations. Our empirical findings have shown that the augmented representation with conversation structures is important, which achieves the best performance when combined with explicit modeling of long distance relations.
6 0.053518523 75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech
7 0.048069429 84 emnlp-2010-NLP on Spoken Documents Without ASR
8 0.042693634 63 emnlp-2010-Improving Translation via Targeted Paraphrasing
9 0.041798089 54 emnlp-2010-Generating Confusion Sets for Context-Sensitive Error Correction
10 0.039319422 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing
11 0.035297573 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification
12 0.03058527 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification
13 0.029099325 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions
14 0.026709257 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa
15 0.025593653 93 emnlp-2010-Resolving Event Noun Phrases to Their Verbal Mentions
16 0.025229014 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications
17 0.025057118 13 emnlp-2010-A Simple Domain-Independent Probabilistic Approach to Generation
18 0.02504964 86 emnlp-2010-Non-Isomorphic Forest Pair Translation
19 0.024782605 64 emnlp-2010-Incorporating Content Structure into Text Analysis Applications
20 0.02254983 71 emnlp-2010-Latent-Descriptor Clustering for Unsupervised POS Induction
topicId topicWeight
[(0, 0.098), (1, 0.03), (2, 0.011), (3, 0.032), (4, -0.026), (5, 0.018), (6, 0.018), (7, -0.078), (8, -0.01), (9, 0.009), (10, -0.138), (11, -0.185), (12, -0.024), (13, 0.138), (14, 0.069), (15, -0.125), (16, -0.077), (17, -0.046), (18, -0.063), (19, 0.076), (20, -0.155), (21, 0.014), (22, -0.082), (23, 0.06), (24, -0.074), (25, -0.019), (26, 0.117), (27, -0.076), (28, -0.091), (29, 0.182), (30, -0.118), (31, 0.071), (32, 0.032), (33, 0.009), (34, 0.036), (35, 0.113), (36, -0.102), (37, -0.008), (38, -0.205), (39, -0.007), (40, -0.021), (41, 0.041), (42, 0.124), (43, 0.091), (44, -0.023), (45, -0.127), (46, -0.165), (47, 0.05), (48, 0.069), (49, -0.165)]
simIndex simValue paperId paperTitle
same-paper 1 0.97363335 4 emnlp-2010-A Game-Theoretic Approach to Generating Spatial Descriptions
Author: Dave Golland ; Percy Liang ; Dan Klein
Abstract: Language is sensitive to both semantic and pragmatic effects. To capture both effects, we model language use as a cooperative game between two players: a speaker, who generates an utterance, and a listener, who responds with an action. Specifically, we consider the task of generating spatial references to objects, wherein the listener must accurately identify an object described by the speaker. We show that a speaker model that acts optimally with respect to an explicit, embedded listener model substantially outperforms one that is trained to directly generate spatial descriptions.
2 0.66135216 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats
Author: Su Nam Kim ; Lawrence Cavedon ; Timothy Baldwin
Abstract: We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an increasingly popular means of providing customer service. In particular, we investigate the effectiveness of various features and machine learners for this task. While a simple bag-of-words approach provides a solid baseline, we find that adding information from dialogue structure and inter-utterance dependency provides some increase in performance; learners that account for sequential dependencies (CRFs) show the best performance. We report our results from testing using a corpus of chat dialogues derived from online shopping customer-feedback data.
3 0.56944311 53 emnlp-2010-Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue
Author: Zahar Prasov ; Joyce Y. Chai
Abstract: In situated dialogue humans often utter linguistic expressions that refer to extralinguistic entities in the environment. Correctly resolving these references is critical yet challenging for artificial agents partly due to their limited speech recognition and language understanding capabilities. Motivated by psycholinguistic studies demonstrating a tight link between language production and human eye gaze, we have developed approaches that integrate naturally occurring human eye gaze with speech recognition hypotheses to resolve exophoric references in situated dialogue in a virtual world. In addition to incorporating eye gaze with the best recognized spoken hypothesis, we developed an algorithm to also handle multiple hypotheses modeled as word confusion networks. Our empirical results demonstrate that incorporating eye gaze with recognition hypotheses consistently outperforms the results obtained from processing recognition hypotheses alone. Incorporating eye gaze with word confusion networks further improves performance.
4 0.33312887 54 emnlp-2010-Generating Confusion Sets for Context-Sensitive Error Correction
Author: Alla Rozovskaya ; Dan Roth
Abstract: In this paper, we consider the problem of generating candidate corrections for the task of correcting errors in text. We focus on the task of correcting errors in preposition usage made by non-native English speakers, using discriminative classifiers. The standard approach to the problem assumes that the set of candidate corrections for a preposition consists of all preposition choices participating in the task. We determine likely preposition confusions using an annotated corpus of nonnative text and use this knowledge to produce smaller sets of candidates. We propose several methods of restricting candidate sets. These methods exclude candidate prepositions that are not observed as valid corrections in the annotated corpus and take into account the likelihood of each preposition confusion in the non-native text. We find that restricting candidates to those that are ob- served in the non-native data improves both the precision and the recall compared to the approach that views all prepositions as possible candidates. Furthermore, the approach that takes into account the likelihood of each preposition confusion is shown to be the most effective.
5 0.23121592 93 emnlp-2010-Resolving Event Noun Phrases to Their Verbal Mentions
Author: Bin Chen ; Jian Su ; Chew Lim Tan
Abstract: unkown-abstract
6 0.21191405 13 emnlp-2010-A Simple Domain-Independent Probabilistic Approach to Generation
7 0.20174409 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification
8 0.2008291 84 emnlp-2010-NLP on Spoken Documents Without ASR
9 0.18314777 83 emnlp-2010-Multi-Level Structured Models for Document-Level Sentiment Classification
10 0.17164478 75 emnlp-2010-Lessons Learned in Part-of-Speech Tagging of Conversational Speech
11 0.17029852 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields
12 0.16824961 81 emnlp-2010-Modeling Perspective Using Adaptor Grammars
13 0.16524073 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation
14 0.1570569 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing
15 0.14604202 77 emnlp-2010-Measuring Distributional Similarity in Context
16 0.13835098 71 emnlp-2010-Latent-Descriptor Clustering for Unsupervised POS Induction
17 0.13505857 9 emnlp-2010-A New Approach to Lexical Disambiguation of Arabic Text
18 0.13296621 108 emnlp-2010-Training Continuous Space Language Models: Some Practical Issues
19 0.13104358 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics
20 0.13040596 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing
topicId topicWeight
[(12, 0.032), (29, 0.07), (30, 0.017), (32, 0.032), (39, 0.409), (52, 0.025), (56, 0.084), (62, 0.02), (66, 0.076), (72, 0.035), (76, 0.039), (77, 0.016), (79, 0.012), (87, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.74878705 4 emnlp-2010-A Game-Theoretic Approach to Generating Spatial Descriptions
Author: Dave Golland ; Percy Liang ; Dan Klein
Abstract: Language is sensitive to both semantic and pragmatic effects. To capture both effects, we model language use as a cooperative game between two players: a speaker, who generates an utterance, and a listener, who responds with an action. Specifically, we consider the task of generating spatial references to objects, wherein the listener must accurately identify an object described by the speaker. We show that a speaker model that acts optimally with respect to an explicit, embedded listener model substantially outperforms one that is trained to directly generate spatial descriptions.
2 0.32566831 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar
Author: Kristian Woodsend ; Yansong Feng ; Mirella Lapata
Abstract: The task of selecting information and rendering it appropriately appears in multiple contexts in summarization. In this paper we present a model that simultaneously optimizes selection and rendering preferences. The model operates over a phrase-based representation of the source document which we obtain by merging PCFG parse trees and dependency graphs. Selection preferences for individual phrases are learned discriminatively, while a quasi-synchronous grammar (Smith and Eisner, 2006) captures rendering preferences such as paraphrases and compressions. Based on an integer linear programming formulation, the model learns to generate summaries that satisfy both types of preferences, while ensuring that length, topic coverage and grammar constraints are met. Experiments on headline and image caption generation show that our method obtains state-of-the-art performance using essentially the same model for both tasks without any major modifications.
3 0.32185131 82 emnlp-2010-Multi-Document Summarization Using A* Search and Discriminative Learning
Author: Ahmet Aker ; Trevor Cohn ; Robert Gaizauskas
Abstract: In this paper we address two key challenges for extractive multi-document summarization: the search problem of finding the best scoring summary and the training problem of learning the best model parameters. We propose an A* search algorithm to find the best extractive summary up to a given length, which is both optimal and efficient to run. Further, we propose a discriminative training algorithm which directly maximises the quality ofthe best summary, rather than assuming a sentence-level decomposition as in earlier work. Our approach leads to significantly better results than earlier techniques across a number of evaluation metrics.
4 0.31993872 107 emnlp-2010-Towards Conversation Entailment: An Empirical Investigation
Author: Chen Zhang ; Joyce Chai
Abstract: While a significant amount of research has been devoted to textual entailment, automated entailment from conversational scripts has received less attention. To address this limitation, this paper investigates the problem of conversation entailment: automated inference of hypotheses from conversation scripts. We examine two levels of semantic representations: a basic representation based on syntactic parsing from conversation utterances and an augmented representation taking into consideration of conversation structures. For each of these levels, we further explore two ways of capturing long distance relations between language constituents: implicit modeling based on the length of distance and explicit modeling based on actual patterns of relations. Our empirical findings have shown that the augmented representation with conversation structures is important, which achieves the best performance when combined with explicit modeling of long distance relations.
5 0.31808645 58 emnlp-2010-Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation
Author: Jordan Boyd-Graber ; Philip Resnik
Abstract: In this paper, we develop multilingual supervised latent Dirichlet allocation (MLSLDA), a probabilistic generative model that allows insights gleaned from one language’s data to inform how the model captures properties of other languages. MLSLDA accomplishes this by jointly modeling two aspects of text: how multilingual concepts are clustered into thematically coherent topics and how topics associated with text connect to an observed regression variable (such as ratings on a sentiment scale). Concepts are represented in a general hierarchical framework that is flexible enough to express semantic ontologies, dictionaries, clustering constraints, and, as a special, degenerate case, conventional topic models. Both the topics and the regression are discovered via posterior inference from corpora. We show MLSLDA can build topics that are consistent across languages, discover sensible bilingual lexical correspondences, and leverage multilingual corpora to better predict sentiment. Sentiment analysis (Pang and Lee, 2008) offers the promise of automatically discerning how people feel about a product, person, organization, or issue based on what they write online, which is potentially of great value to businesses and other organizations. However, the vast majority of sentiment resources and algorithms are limited to a single language, usually English (Wilson, 2008; Baccianella and Sebastiani, 2010). Since no single language captures a majority of the content online, adopting such a limited approach in an increasingly global community risks missing important details and trends that might only be available when text in multiple languages is taken into account. 45 Philip Resnik Department of Linguistics and UMIACS University of Maryland College Park, MD re snik@umd .edu Up to this point, multiple languages have been addressed in sentiment analysis primarily by transferring knowledge from a resource-rich language to a less rich language (Banea et al., 2008), or by ignoring differences in languages via translation into English (Denecke, 2008). These approaches are limited to a view of sentiment that takes place through an English-centric lens, and they ignore the potential to share information between languages. Ideally, learning sentiment cues holistically, across languages, would result in a richer and more globally consistent picture. In this paper, we introduce Multilingual Supervised Latent Dirichlet Allocation (MLSLDA), a model for sentiment analysis on a multilingual corpus. MLSLDA discovers a consistent, unified picture of sentiment across multiple languages by learning “topics,” probabilistic partitions of the vocabulary that are consistent in terms of both meaning and relevance to observed sentiment. Our approach makes few assumptions about available resources, requiring neither parallel corpora nor machine translation. The rest of the paper proceeds as follows. In Section 1, we describe the probabilistic tools that we use to create consistent topics bridging across languages and the MLSLDA model. In Section 2, we present the inference process. We discuss our set of semantic bridges between languages in Section 3, and our experiments in Section 4 demonstrate that this approach functions as an effective multilingual topic model, discovers sentiment-biased topics, and uses multilingual corpora to make better sentiment predictions across languages. Sections 5 and 6 discuss related research and discusses future work, respectively. ProcMe IdTi,n Mgsas ofsa tchehu 2se0t1t0s, C UoSnAfe,r 9e-n1ce1 o Onc Etombepri 2ic0a1l0 M. ?ec th2o0d1s0 i Ans Nsaotcuiartaioln La fonrg Cuaogmep Purtoatcieosnsainlg L,in pgagueis ti 4c5s–5 , 1 Predictions from Multilingual Topics As its name suggests, MLSLDA is an extension of Latent Dirichlet allocation (LDA) (Blei et al., 2003), a modeling approach that takes a corpus of unannotated documents as input and produces two outputs, a set of “topics” and assignments of documents to topics. Both the topics and the assignments are probabilistic: a topic is represented as a probability distribution over words in the corpus, and each document is assigned a probability distribution over all the topics. Topic models built on the foundations of LDA are appealing for sentiment analysis because the learned topics can cluster together sentimentbearing words, and because topic distributions are a parsimonious way to represent a document.1 LDA has been used to discover latent structure in text (e.g. for discourse segmentation (Purver et al., 2006) and authorship (Rosen-Zvi et al., 2004)). MLSLDA extends the approach by ensuring that this latent structure the underlying topics is consistent across languages. We discuss multilingual topic modeling in Section 1. 1, and in Section 1.2 we show how this enables supervised regression regardless of a document’s language. — — 1.1 Capturing Semantic Correlations Topic models posit a straightforward generative process that creates an observed corpus. For each docu- ment d, some distribution θd over unobserved topics is chosen. Then, for each word position in the document, a topic z is selected. Finally, the word for that position is generated by selecting from the topic indexed by z. (Recall that in LDA, a “topic” is a distribution over words). In monolingual topic models, the topic distribution is usually drawn from a Dirichlet distribution. Using Dirichlet distributions makes it easy to specify sparse priors, and it also simplifies posterior inference because Dirichlet distributions are conjugate to multinomial distributions. However, drawing topics from Dirichlet distributions will not suffice if our vocabulary includes multiple languages. If we are working with English, German, and Chinese at the same time, a Dirichlet prior has no way to favor distributions z such that p(good|z), p(gut|z), and 1The latter property has also made LDA popular for information retrieval (Wei and Croft, 2006)). 46 p(h aˇo|z) all tend to be high at the same time, or low at hth ˇaeo same lti tmened. tMoo bree generally, et sheam structure oorf our model must encourage topics to be consistent across languages, and Dirichlet distributions cannot encode correlations between elements. One possible solution to this problem is to use the multivariate normal distribution, which can produce correlated multinomials (Blei and Lafferty, 2005), in place of the Dirichlet distribution. This has been done successfully in multilingual settings (Cohen and Smith, 2009). However, such models complicate inference by not being conjugate. Instead, we appeal to tree-based extensions of the Dirichlet distribution, which has been used to induce correlation in semantic ontologies (Boyd-Graber et al., 2007) and to encode clustering constraints (Andrzejewski et al., 2009). The key idea in this approach is to assume the vocabularies of all languages are organized according to some shared semantic structure that can be represented as a tree. For concreteness in this section, we will use WordNet (Miller, 1990) as the representation of this multilingual semantic bridge, since it is well known, offers convenient and intuitive terminology, and demonstrates the full flexibility of our approach. However, the model we describe generalizes to any tree-structured rep- resentation of multilingual knowledge; we discuss some alternatives in Section 3. WordNet organizes a vocabulary into a rooted, directed acyclic graph of nodes called synsets, short for “synonym sets.” A synset is a child of another synset if it satisfies a hyponomy relationship; each child “is a” more specific instantiation of its parent concept (thus, hyponomy is often called an “isa” relationship). For example, a “dog” is a “canine” is an “animal” is a “living thing,” etc. As an approximation, it is not unreasonable to assume that WordNet’s structure of meaning is language independent, i.e. the concept encoded by a synset can be realized using terms in different languages that share the same meaning. In practice, this organization has been used to create many alignments of international WordNets to the original English WordNet (Ordan and Wintner, 2007; Sagot and Fiˇ ser, 2008; Isahara et al., 2008). Using the structure of WordNet, we can now describe a generative process that produces a distribution over a multilingual vocabulary, which encourages correlations between words with similar meanings regardless of what language each word is in. For each synset h, we create a multilingual word distribution for that synset as follows: 1. Draw transition probabilities βh ∼ Dir (τh) 2. Draw stop probabilities ωh ∼ Dir∼ (κ Dhi)r 3. For each language l, draw emission probabilities for that synset φh,l ∼ Dir (πh,l) . For conciseness in the rest of the paper, we will refer to this generative process as multilingual Dirichlet hierarchy, or MULTDIRHIER(τ, κ, π) .2 Each observed token can be viewed as the end result of a sequence of visited synsets λ. At each node in the tree, the path can end at node iwith probability ωi,1, or it can continue to a child synset with probability ωi,0. If the path continues to another child synset, it visits child j with probability βi,j. If the path ends at a synset, it generates word k with probability φi,l,k.3 The probability of a word being emitted from a path with visited synsets r and final synset h in language lis therefore p(w, λ = r, h|l, β, ω, φ) = (iY,j)∈rβi,jωi,0(1 − ωh,1)φh,l,w. Note that the stop probability ωh (1) is independent of language, but the emission φh,l is dependent on the language. This is done to prevent the following scenario: while synset A is highly probable in a topic and words in language 1attached to that synset have high probability, words in language 2 have low probability. If this could happen for many synsets in a topic, an entire language would be effectively silenced, which would lead to inconsistent topics (e.g. 2Variables τh, πh,l, and κh are hyperparameters. Their mean is fixed, but their magnitude is sampled during inference (i.e. Pkτhτ,ih,k is constant, but τh,i is not). For the bushier bridges, (Pe.g. dictionary and flat), their mean is uniform. For GermaNet, we took frequencies from two balanced corpora of German and English: the British National Corpus (University of Oxford, 2006) and the Kern Corpus of the Digitales Wo¨rterbuch der Deutschen Sprache des 20. Jahrhunderts project (Geyken, 2007). We took these frequencies and propagated them through the multilingual hierarchy, following LDAWN’s (Boyd-Graber et al., 2007) formulation of information content (Resnik, 1995) as a Bayesian prior. The variance of the priors was initialized to be 1.0, but could be sampled during inference. 3Note that the language and word are taken as given, but the path through the semantic hierarchy is a latent random variable. 47 Topic 1 is about baseball in English and about travel in German). Separating path from emission helps ensure that topics are consistent across languages. Having defined topic distributions in a way that can preserve cross-language correspondences, we now use this distribution within a larger model that can discover cross-language patterns of use that predict sentiment. 1.2 The MLSLDA Model We will view sentiment analysis as a regression problem: given an input document, we want to predict a real-valued observation y that represents the sentiment of a document. Specifically, we build on supervised latent Dirichlet allocation (SLDA, (Blei and McAuliffe, 2007)), which makes predictions based on the topics expressed in a document; this can be thought of projecting the words in a document to low dimensional space of dimension equal to the number of topics. Blei et al. showed that using this latent topic structure can offer improved predictions over regressions based on words alone, and the approach fits well with our current goals, since word-level cues are unlikely to be identical across languages. In addition to text, SLDA has been successfully applied to other domains such as social networks (Chang and Blei, 2009) and image classification (Wang et al., 2009). The key innovation in this paper is to extend SLDA by creating topics that are globally consistent across languages, using the bridging approach above. We express our model in the form of a probabilistic generative latent-variable model that generates documents in multiple languages and assigns a realvalued score to each document. The score comes from a normal distribution whose sum is the dot product between a regression parameter η that encodes the influence of each topic on the observation and a variance σ2. With this model in hand, we use statistical inference to determine the distribution over latent variables that, given the model, best explains observed data. The generative model is as follows: 1. For each topic i= 1. . . K, draw a topic distribution {βi, ωi, φi} from MULTDIRHIER(τ, κ, π). 2. {Foβr each do}cuf mroemn tM Md = 1. . . M with language ld: (a) CDihro(oαse). a distribution over topics θd ∼ (b) For each word in the document n = 1. . . Nd, choose a topic assignment zd,n ∼ Mult (θd) and a path λd,n ending at word wd,n according to Equation 1using {βzd,n , ωzd,n , φzd,n }. 3. Choose a re?sponse variable from y Norm ?η> z¯, σ2?, where z¯ d ≡ N1 PnN=1 zd,n. ∼ Crucially, note that the topics are not independent of the sentiment task; the regression encourages terms with similar effects on the observation y to be in the same topic. The consistency of topics described above allows the same regression to be done for the entire corpus regardless of the language of the underlying document. 2 Inference Finding the model parameters most likely to explain the data is a problem of statistical inference. We employ stochastic EM (Diebolt and Ip, 1996), using a Gibbs sampler for the E-step to assign words to paths and topics. After randomly initializing the topics, we alternate between sampling the topic and path of a word (zd,n, λd,n) and finding the regression parameters η that maximize the likelihood. We jointly sample the topic and path conditioning on all of the other path and document assignments in the corpus, selecting a path and topic with probability p(zn = k, λn = r|z−n , λ−n, wn , η, σ, Θ) = p(yd|z, η, σ)p(λn = r|zn = k, λ−n, wn, τ, p(zn = k|z−n, α) . κ, π) (2) Each of these three terms reflects a different influence on the topics from the vocabulary structure, the document’s topics, and the response variable. In the next paragraphs, we will expand each of them to derive the full conditional topic distribution. As discussed in Section 1.1, the structure of the topic distribution encourages terms with the same meaning to be in the same topic, even across languages. During inference, we marginalize over possible multinomial distributions β, ω, and φ, using the observed transitions from ito j in topic k; Tk,i,j, stop counts in synset iin topic k, Ok,i,0; continue counts in synsets iin topic k, Ok,i,1 ; and emission counts in synset iin language lin topic k, Fk,i,l. The 48 Multilingual Topics Text Documents Sentiment Prediction Figure 1: Graphical model representing MLSLDA. Shaded nodes represent observations, plates denote replication, and lines show probabilistic dependencies. probability of taking a path r is then p(λn = r|zn = k, λ−n) = (iY,j)∈r PBj0Bk,ik,j,i,+j0 τ+i,j τi,jPs∈0O,1k,Oi,1k,+i,s ω+i ωi,s! |(iY,j)∈rP{zP} Tran{szitiPon Ok,rend,0 + ωrend Fk,rend,wn + πrend,}l Ps∈0,1Ok,rend,s+ ωrend,sPw0Frend,w0+ πrend,w0 |PEmi{szsiPon} (3) Equation 3 reflects the multilingual aspect of this model. The conditional topic distribution for SLDA (Blei and McAuliffe, 2007) replaces this term with the standard Multinomial-Dirichlet. However, we believe this is the first published SLDA-style model using MCMC inference, as prior work has used variational inference (Blei and McAuliffe, 2007; Chang and Blei, 2009; Wang et al., 2009). Because the observed response variable depends on the topic assignments of a document, the conditional topic distribution is shifted toward topics that explain the observed response. Topics that move the predicted response yˆd toward the true yd will be favored. We drop terms that are constant across all topics for the effect of the response variable, p(yd|z, η, σ) ∝ exp?σ12?yd−PPk0kN0Nd,dk,0kη0k0?Pkη0Nzkd,k0? |??PP{z?P?} . Other wPord{zs’ influence exp
6 0.31764746 102 emnlp-2010-Summarizing Contrastive Viewpoints in Opinionated Text
7 0.31747836 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification
8 0.31677088 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions
9 0.31485626 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding
10 0.31423113 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks
11 0.31355989 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice
12 0.31309158 86 emnlp-2010-Non-Isomorphic Forest Pair Translation
13 0.31293368 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation
14 0.31239745 49 emnlp-2010-Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields
15 0.31112257 25 emnlp-2010-Better Punctuation Prediction with Dynamic Conditional Random Fields
16 0.31098801 103 emnlp-2010-Tense Sense Disambiguation: A New Syntactic Polysemy Task
17 0.30991185 60 emnlp-2010-Improved Fully Unsupervised Parsing with Zoomed Learning
18 0.30987495 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions
19 0.30724475 80 emnlp-2010-Modeling Organization in Student Essays
20 0.30706611 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment