emnlp emnlp2013 emnlp2013-153 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nikos Engonopoulos ; Martin Villalba ; Ivan Titov ; Alexander Koller
Abstract: We present a statistical model for predicting how the user of an interactive, situated NLP system resolved a referring expression. The model makes an initial prediction based on the meaning of the utterance, and revises it continuously based on the user’s behavior. The combined model outperforms its components in predicting reference resolution and when to give feedback.
Reference: text
sentIndex sentText sentNum sentScore
1 Predicting the resolution of referring expressions from user behavior Nikos Engonopoulos1 Mart ı´n Villalba1 Ivan Titov2 Alexander Koller1 1University of Potsdam, Germany 2University of Amsterdam, Netherlands {niko l s . [sent-1, score-0.339]
2 doet Abstract We present a statistical model for predicting how the user of an interactive, situated NLP system resolved a referring expression. [sent-9, score-0.407]
3 The model makes an initial prediction based on the meaning of the utterance, and revises it continuously based on the user’s behavior. [sent-10, score-0.131]
4 The combined model outperforms its components in predicting reference resolution and when to give feedback. [sent-11, score-0.18]
5 1 Introduction Speakers and listeners in natural communication are engaged in a highly interactive process. [sent-12, score-0.124]
6 This process is a core part of what is commonly called grounding in the dia- logue literature (see e. [sent-15, score-0.127]
7 Interactive computer systems that are to carry out an effective and efficient conversation with a user must model this grounding process, and should ideally respond to the user’s observed behavior in real time. [sent-19, score-0.329]
8 For instance, if the user of a pedestrian navigation system takes a wrong turn, the system should interpret this as evidence of misunderstanding and bring the user back on track. [sent-20, score-0.334]
9 We focus here on the problem of predicting how the user has resolved a referring expression (RE) that was generated by the system, i. [sent-21, score-0.316]
10 (2010) and Garoufi and Koller (201 1) have presented log-linear models for predicting how the listener will resolve a given RE in a given scene; however, these models do not update the probability model based on observing the user’s reactions. [sent-26, score-0.272]
11 (2012) all predict what the listener understood based on their behavior, but do not consider the RE itself in the model. [sent-29, score-0.199]
12 (2013) aim at explaining the effect of implicatures on the listener’s RE resolution process in terms of hypothesized interactions, but do not actually support a realtime interaction between a system and a user. [sent-31, score-0.22]
13 In this paper, we show how to predict how the listener has resolved an RE by combining a statistical model of RE resolution based on the RE itself with a statistical model of RE resolution based on the listener’s behavior. [sent-32, score-0.387]
14 We consider the RE grounding problem in the context of interactive, situated natural language generation (NLG) for the GIVE Challenge (Koller et al. [sent-34, score-0.218]
15 , 2010a), where NLG systems must generate realtime instructions in virtual 3D environments. [sent-35, score-0.337]
16 5 Challenges, which contain the systems’ utterances along with the behavior of human hearers in response to these utterances. [sent-37, score-0.188]
17 We find that the combined model predicts RE resolution more accurately than each of the two compo- nent models alone. [sent-38, score-0.116]
18 We see this as a first step towards implementing an actual interactive system that performs human-like grounding based on our RE resolution model. [sent-39, score-0.341]
19 To complete the task, the IF must press a number of buttons in the correct order; these buttons are the colored boxes in Fig. [sent-44, score-0.197]
20 1, and are scattered all over the virtual environ- ment. [sent-45, score-0.201]
21 The IF can move around freely in the virtual environment, but has no prior knowledge about the world. [sent-46, score-0.201]
22 To this end, it is continuously being informed about the IF’s movements and visual field, and can generate written utterances at any time. [sent-48, score-0.219]
23 Many system utterances are manipulation instructions, such as “press the blue button”, containing an RE in the form of a definite NP. [sent-53, score-0.23]
24 We call a given part of an interaction between the system and the IF an episode of that interaction if it starts with a manipulation instruction, ends with the IF performing an action (i. [sent-54, score-0.368]
25 , pressing a button), and contains only IF movements and no further utterances in between. [sent-56, score-0.147]
26 Not all manipulation instructions initiate an episode, because the system may decide to perform further utterances (not containing REs) before the IF performs their action. [sent-57, score-0.261]
27 An NLG system will choose the RE for an instruction at runtime out of potentially many semantically valid alternatives (“the blue button”, “the button next to the chair”, “the button to the right of the red button”, etc. [sent-58, score-1.094]
28 Ideally, it will predict which of these REs has the highest chance to be understood by the IF, given the current scene, and utter an instruction that uses this RE. [sent-60, score-0.189]
29 1355 After uttering the manipulation instruction, the system needs to ascertain whether the IF understood the RE correctly, i. [sent-61, score-0.158]
30 A naive grounding mechanism might wait until the IF actually presses a button and check whether it was the right one. [sent-64, score-0.643]
31 However, this can make the communication ineffective (IF performs many useless actions) and risky (IF may press the wrong button and lose). [sent-66, score-0.551]
32 Thus, it is important that the system updates its prediction of how the IF resolved the RE continuously by observing the IF’s behavior, before the actual button press. [sent-67, score-0.693]
33 For instance, if the IF walks towards the target, this might reinforce the system’s belief in a correct understanding; turning away or exiting the room could be strong evidence of the opposite. [sent-68, score-0.094]
34 The system can then exploit the updated prediction to give the IF feedback (“no, the blue button”) to prevent costly mistakes. [sent-69, score-0.289]
35 We address these challenges by estimating the probability distribution over the possible objects to which the IF may resolve the RE. [sent-70, score-0.146]
36 Given an RE r generated for a∗ at time t0, the state of the world s at t0, and the observed behavior σ(t) of the user at t t0, we eobstsimeravteed t bhee probability p(a|r, s, σ(t)) t tha ≥t th te user reestsoimlvaetde r to an object a ∈ A|r. [sent-73, score-0.395]
37 d W product sofr a esesemnatpnt(ica m,so,dσe)l psem(a|r, s) and an observational model pobs (a|σ). [sent-79, score-0.274]
38 The feature functions we use only consider general properties of objects (such as color and distance), and not the identity of the objects themselves. [sent-82, score-0.144]
39 This means that we can train a model on one virtual environment (containing a certain set of objects), and then apply the model to another virtual environment, containing a different set of objects. [sent-83, score-0.513]
40 Semantic model The semantic model estimates for each object a in the environment the initial probability psem (a|r, s) that the IF will understand a given RE r u(att|erre,ds) )in t a scene s as referring sttoa a. [sent-84, score-0.421]
41 an RE like “the button next to the red button” might confuse the IF into pressing a red button, rather than the one meant by the system. [sent-95, score-0.542]
42 IsVisible evaluates to 1 if a is visible to the IF in s. [sent-98, score-0.157]
43 IsTargetInFront evaluates to 1 if the angular distance towards a, i. [sent-100, score-0.285]
44 VisualSalience approximates the visual salience of Kelleher and van Genabith (2004), a weighted count of the number ofpixels on which a is rendered (pixels near the center of the screen have higher weights). [sent-103, score-0.132]
45 Observational model The observational model estimates for each object a the probability pobs (a|σ) that the IF will interact with a, given the IF’s re(ac|eσn)t behavior σ(t) = (σ1 , . [sent-104, score-0.427]
46 pobs i sn constantly re-evaluated for times t > t0 as the IF moves around. [sent-108, score-0.183]
47 pobs uses the following features: • Linear distance features assume that the closeLsitn beuatrto dni sitsa anlscoe tfehea one sth aes sIuFm muen dtheartst tohoed c. [sent-109, score-0.237]
48 oI ns-Room returns the number of frames σi in σ in − − ≥ • • • which the IF and a are in the same room. [sent-110, score-0.128]
49 ButtonDistance returns the distance between the IF and a at σ1 divided by a constant such that the result never exceeds 1. [sent-111, score-0.139]
50 If a is neither in the same room nor visible, the feature returns 1. [sent-112, score-0.135]
51 TargetInFront returns the angular distance towards a at σ1. [sent-114, score-0.297]
52 AngleToTarget returns TargetInFront divided by π, or 1 if a is neither in the same room nor visible. [sent-115, score-0.135]
53 LinearRegAngleTo applies linear regression to a list of observed angular distances towards a over all frames σi, and returns the slope of the regression as a measure of variation. [sent-116, score-0.331]
54 If a is neither visible nor in the same room as the IF at σi, the angle is set to π. [sent-118, score-0.169]
55 Combined distance feature: a weighted sum oCfo lminbeianr eadnd d angular d feisatatunrcee: :to aw waredigsh a, dca slluemd overall distance in Koller et al. [sent-119, score-0.222]
56 Salience features capture visual salience and iStsa change over rtiemse c. [sent-121, score-0.132]
57 a Defining VSi as tehnec ree asunldt of applying the psem feature VisualSalience to and a, LastVisualSalience returns VSn. [sent-122, score-0.221]
58 LinearRegVisualSalience applies linear regression to all values VSi and returns the slope as a measure of change in salience. [sent-123, score-0.13]
59 • Binary features aim to detect concrete behaviBoirn patterns: u LraesstI asiVmisi tbole d applies nthcree psem hfeava-ture IsVisible to σ1, and IsClose evaluates to 1 if the IF is close enough and correctly oriented to manipulate a in the GIVE environment at σ1. [sent-126, score-0.32]
60 These datasets constitute interaction corpora, in which the IF’s activities in the virtual environment were recorded along with the utterances automatically generated by the participating NLG systems. [sent-131, score-0.475]
61 5 data, we first identified moments in the recorded data where the IF pressed a button. [sent-135, score-0.137]
62 From these, we discarded all instances from the tutorial phase of the GIVE game and those that happened within 200 ms after the previous utterance, as these clearly didn’t happen in response to it. [sent-136, score-0.132]
63 This yielded 6478 training instances for pobs, each consisting of σ at 1 second before the action, and the button a which the IF pressed. [sent-137, score-0.5]
64 We chose n = 4 for representing σ, except to ensure that the features only considered IF behavior that happened in response to an utterance. [sent-138, score-0.15]
65 Finally, we selected those instances which are episodes in the sense of Section 2, i. [sent-140, score-0.22]
66 those in which the last utterance before the action contained an RE r. [sent-142, score-0.153]
67 We chose GIVE-2 for testing because the mean episode length is higher (3. [sent-146, score-0.104]
68 Note that the test data and training data are based on distinct sets of three virtual environ- rycaA u. [sent-153, score-0.201]
69 G KcrosaGbenmsSedobaCrivnmatiecdvoisnaGble (a) Time before action (sec) 1357 Figure 2: Prediction accuracy for (a) all episodes, (b) unsuccessful episodes as a function of time. [sent-155, score-0.27]
70 An example video showing our models’ predictions on some training episodes can be found at http : / /t inyurl . [sent-158, score-0.157]
71 Prediction accuracy We first evaluated the ability of our model to predict the button to which the IF resolved each RE. [sent-160, score-0.521]
72 Wep c(aal|lr t,hse, proportion eo fo correctly ucllaastesidfi ebdy instances the prediction accuracy. [sent-162, score-0.13]
73 We plot prediction accuracy as a function of the time at which the model is queried for a prediction, by evaluating at 3s, 2s, 1s, and 0s before the button press. [sent-165, score-0.504]
74 The graph is based on the 2094 test instances with an episode length of at least three seconds, to ensure that re- sults for different prediction times are comparable. [sent-166, score-0.234]
75 Furthermore, the combined model outperforms both psem and pobs reliably. [sent-168, score-0.319]
76 Our model also outperforms two more baselines: KGSC predicts that the IF will press the button with the minimal overall distance, which is the distance metric used by the “movement-based system” of Koller et al. [sent-170, score-0.569]
77 (2012); random visible selects a random button from the ones that are currently visible to the IF. [sent-171, score-0.605]
78 The fact that this last baseline does not approach 1 at action time suggests that multiple buttons tend to be visible when the IF presses one, confirming that the prediction task is not trivial. [sent-172, score-0.346]
79 Correctly predicting the button that the IF will press is especially useful, and challenging, in those cases where the IF pressed a different button than the one the NLG system intended. [sent-173, score-1.059]
80 2b shows a closer look at the 125 unsuccessful episodes of at least three seconds in the test data. [sent-175, score-0.198]
81 However, by integrating semantic and observational information, the combined model compensates better for this than all other systems, with an accuracy of 37. [sent-177, score-0.091]
82 Feedback appropriateness Second, we evaluated the ability of our model to predict whether the user misunderstood the RE and requires feedback. [sent-180, score-0.173]
83 For all the above models, we assumed a simple feedback mechanism which predicts that the user misunderstood the RE if p(a0) − p(a∗) > θ for some object a0 a∗, where θ is a c −on pfi(adence threshold; we used θ = 0 a. [sent-181, score-0.437]
84 We can thus test on recorded data in which no actual feedback can be given anymore. [sent-183, score-0.167]
85 We evaluated the models on the 848 test episodes of at least 3s in which the NLG systems logged the button they tried to refer to. [sent-184, score-0.594]
86 Here precision is the proportion of instances in which the IF pressed the wrong button (i. [sent-187, score-0.631]
87 , where feedback should have been given) among the instances where the model actually suggested feedback. [sent-189, score-0.184]
88 Recall is the proportion of instances in which the model suggested feedback among the instances where the IF pressed the wrong button. [sent-190, score-0.378]
89 The difference is particularly pronounced early on, which would be useful in giving timely feedback in an actual real-time system. [sent-192, score-0.121]
90 5 Conclusion and future work We presented a statistical model for predicting how a user will resolve the REs generated by an interactive, situated NLG system. [sent-194, score-0.298]
91 It out- performs its components and two baselines on prediction and feedback accuracy. [sent-196, score-0.188]
92 Our model captures a real-time grounding process on the part of the interactive system. [sent-197, score-0.218]
93 We thus believe that it provides a solid foundation for detecting misunderstandings and generating suitable feedback in an end-to-end dialogue system. [sent-198, score-0.209]
94 We have presented our model in terms of a situated dialogue setting, where clues about what the hearer understood can be observed directly. [sent-199, score-0.249]
95 The immediate next step for future research is to extend our model to an implemented end-to-end situated NLG system for the GIVE Challenge, and evaluate whether this actually improves task performance. [sent-202, score-0.091]
96 We will furthermore improve pobs by switching to a more temporally dynamic probability model. [sent-204, score-0.183]
97 Visual salience and reference resolution in simulated 3-D environments. [sent-247, score-0.162]
98 Predicting evidence of understanding by monitoring user’s task manipulation in multimodal 1359 conversations. [sent-266, score-0.104]
99 Uncertainty, utility, and misunderstanding: A decision-theoretic perspective on grounding in conversational systems. [sent-270, score-0.127]
100 A computational theory of grounding in natural language conversation. [sent-279, score-0.127]
wordName wordTfidf (topN-words)
[('button', 0.437), ('nlg', 0.29), ('virtual', 0.201), ('koller', 0.191), ('pobs', 0.183), ('re', 0.172), ('episodes', 0.157), ('listener', 0.145), ('psem', 0.136), ('instruction', 0.135), ('garoufi', 0.131), ('grounding', 0.127), ('feedback', 0.121), ('user', 0.121), ('angular', 0.114), ('environment', 0.111), ('konstantina', 0.105), ('episode', 0.104), ('manipulation', 0.104), ('interactive', 0.091), ('situated', 0.091), ('pressed', 0.091), ('observational', 0.091), ('returns', 0.085), ('visible', 0.084), ('instructions', 0.084), ('resolved', 0.084), ('striegnitz', 0.083), ('salience', 0.083), ('behavior', 0.081), ('utterance', 0.081), ('resolution', 0.079), ('buttons', 0.078), ('utterances', 0.073), ('evaluates', 0.073), ('objects', 0.072), ('object', 0.072), ('action', 0.072), ('hearer', 0.068), ('prediction', 0.067), ('continuously', 0.064), ('instances', 0.063), ('referring', 0.058), ('res', 0.055), ('understood', 0.054), ('distance', 0.054), ('predicting', 0.053), ('blue', 0.053), ('cassell', 0.052), ('gargett', 0.052), ('isvisible', 0.052), ('justine', 0.052), ('misunderstanding', 0.052), ('misunderstandings', 0.052), ('misunderstood', 0.052), ('nakano', 0.052), ('paek', 0.052), ('realtime', 0.052), ('targetinfront', 0.052), ('visualsalience', 0.052), ('vsi', 0.052), ('alexander', 0.052), ('room', 0.05), ('visual', 0.049), ('give', 0.048), ('recorded', 0.046), ('implicatures', 0.045), ('presses', 0.045), ('buschmeier', 0.045), ('enlg', 0.045), ('kelleher', 0.045), ('slope', 0.045), ('scene', 0.044), ('towards', 0.044), ('interaction', 0.044), ('challenge', 0.043), ('frames', 0.043), ('donna', 0.041), ('golland', 0.041), ('environments', 0.041), ('unsuccessful', 0.041), ('pressing', 0.041), ('press', 0.041), ('challenges', 0.041), ('observing', 0.041), ('wrong', 0.04), ('johanna', 0.039), ('mart', 0.039), ('predicts', 0.037), ('byron', 0.036), ('dialogue', 0.036), ('happened', 0.035), ('angle', 0.035), ('kristina', 0.034), ('response', 0.034), ('mechanism', 0.034), ('movements', 0.033), ('communication', 0.033), ('resolve', 0.033), ('red', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.000001 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior
Author: Nikos Engonopoulos ; Martin Villalba ; Ivan Titov ; Alexander Koller
Abstract: We present a statistical model for predicting how the user of an interactive, situated NLP system resolved a referring expression. The model makes an initial prediction based on the meaning of the utterance, and revises it continuously based on the user’s behavior. The combined model outperforms its components in predicting reference resolution and when to give feedback.
2 0.15272862 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation
Author: Rui Fang ; Changsong Liu ; Lanbo She ; Joyce Y. Chai
Abstract: In situated dialogue, humans and agents have mismatched capabilities of perceiving the shared environment. Their representations of the shared world are misaligned. Thus referring expression generation (REG) will need to take this discrepancy into consideration. To address this issue, we developed a hypergraph-based approach to account for group-based spatial relations and uncertainties in perceiving the environment. Our empirical results have shown that this approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). This big performance gap calls for new solutions to REG that can mediate a shared perceptual basis in situated dialogue.
3 0.089733578 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation
Author: Nicholas FitzGerald ; Yoav Artzi ; Luke Zettlemoyer
Abstract: We present a new approach to referring expression generation, casting it as a density estimation problem where the goal is to learn distributions over logical expressions identifying sets of objects in the world. Despite an extremely large space of possible expressions, we demonstrate effective learning of a globally normalized log-linear distribution. This learning is enabled by a new, multi-stage approximate inference technique that uses a pruning model to construct only the most likely logical forms. We train and evaluate the approach on a new corpus of references to sets of visual objects. Experiments show the approach is able to learn accurate models, which generate over 87% of the expressions people used. Additionally, on the previously studied special case of single object reference, we show a 35% relative error reduction over previous state of the art.
4 0.081295304 78 emnlp-2013-Exploiting Language Models for Visual Recognition
Author: Dieu-Thu Le ; Jasper Uijlings ; Raffaella Bernardi
Abstract: The problem of learning language models from large text corpora has been widely studied within the computational linguistic community. However, little is known about the performance of these language models when applied to the computer vision domain. In this work, we compare representative models: a window-based model, a topic model, a distributional memory and a commonsense knowledge database, ConceptNet, in two visual recognition scenarios: human action recognition and object prediction. We examine whether the knowledge extracted from texts through these models are compatible to the knowledge represented in images. We determine the usefulness of different language models in aiding the two visual recognition tasks. The study shows that the language models built from general text corpora can be used instead of expensive annotated images and even outperform the image model when testing on a big general dataset.
Author: Mikhail Ageev ; Dmitry Lagun ; Eugene Agichtein
Abstract: Passage retrieval is a crucial first step of automatic Question Answering (QA). While existing passage retrieval algorithms are effective at selecting document passages most similar to the question, or those that contain the expected answer types, they do not take into account which parts of the document the searchers actually found useful. We propose, to the best of our knowledge, the first successful attempt to incorporate searcher examination data into passage retrieval for question answering. Specifically, we exploit detailed examination data, such as mouse cursor movements and scrolling, to infer the parts of the document the searcher found interesting, and then incorporate this signal into passage retrieval for QA. Our extensive experiments and analysis demonstrate that our method significantly improves passage retrieval, compared to using textual features alone. As an additional contribution, we make available to the research community the code and the search behavior data used in this study, with the hope of encouraging further research in this area.
7 0.051991988 6 emnlp-2013-A Generative Joint, Additive, Sequential Model of Topics and Speech Acts in Patient-Doctor Communication
8 0.049955867 4 emnlp-2013-A Dataset for Research on Short-Text Conversations
9 0.04833208 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs
11 0.047510948 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems
12 0.043456331 11 emnlp-2013-A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities
13 0.043029774 91 emnlp-2013-Grounding Strategic Conversation: Using Negotiation Dialogues to Predict Trades in a Win-Lose Game
14 0.042709336 98 emnlp-2013-Image Description using Visual Dependency Representations
15 0.039261799 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution
16 0.03883804 67 emnlp-2013-Easy Victories and Uphill Battles in Coreference Resolution
17 0.0377995 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
18 0.03685195 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation
19 0.036143217 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction
20 0.035554953 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts
topicId topicWeight
[(0, -0.128), (1, 0.04), (2, -0.012), (3, 0.025), (4, -0.036), (5, 0.096), (6, -0.002), (7, -0.008), (8, -0.041), (9, -0.008), (10, -0.226), (11, 0.014), (12, 0.051), (13, 0.009), (14, 0.016), (15, 0.009), (16, 0.01), (17, 0.016), (18, 0.026), (19, 0.028), (20, 0.105), (21, 0.01), (22, 0.072), (23, 0.008), (24, 0.02), (25, -0.081), (26, 0.0), (27, -0.14), (28, -0.131), (29, 0.103), (30, -0.063), (31, -0.047), (32, 0.03), (33, 0.039), (34, 0.077), (35, -0.03), (36, -0.143), (37, -0.052), (38, -0.14), (39, -0.077), (40, -0.008), (41, 0.107), (42, 0.015), (43, -0.028), (44, -0.01), (45, -0.049), (46, -0.006), (47, 0.057), (48, 0.166), (49, 0.006)]
simIndex simValue paperId paperTitle
same-paper 1 0.95034844 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior
Author: Nikos Engonopoulos ; Martin Villalba ; Ivan Titov ; Alexander Koller
Abstract: We present a statistical model for predicting how the user of an interactive, situated NLP system resolved a referring expression. The model makes an initial prediction based on the meaning of the utterance, and revises it continuously based on the user’s behavior. The combined model outperforms its components in predicting reference resolution and when to give feedback.
2 0.80485803 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation
Author: Rui Fang ; Changsong Liu ; Lanbo She ; Joyce Y. Chai
Abstract: In situated dialogue, humans and agents have mismatched capabilities of perceiving the shared environment. Their representations of the shared world are misaligned. Thus referring expression generation (REG) will need to take this discrepancy into consideration. To address this issue, we developed a hypergraph-based approach to account for group-based spatial relations and uncertainties in perceiving the environment. Our empirical results have shown that this approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). This big performance gap calls for new solutions to REG that can mediate a shared perceptual basis in situated dialogue.
3 0.48998183 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation
Author: Nicholas FitzGerald ; Yoav Artzi ; Luke Zettlemoyer
Abstract: We present a new approach to referring expression generation, casting it as a density estimation problem where the goal is to learn distributions over logical expressions identifying sets of objects in the world. Despite an extremely large space of possible expressions, we demonstrate effective learning of a globally normalized log-linear distribution. This learning is enabled by a new, multi-stage approximate inference technique that uses a pruning model to construct only the most likely logical forms. We train and evaluate the approach on a new corpus of references to sets of visual objects. Experiments show the approach is able to learn accurate models, which generate over 87% of the expressions people used. Additionally, on the previously studied special case of single object reference, we show a 35% relative error reduction over previous state of the art.
4 0.45979634 91 emnlp-2013-Grounding Strategic Conversation: Using Negotiation Dialogues to Predict Trades in a Win-Lose Game
Author: Anais Cadilhac ; Nicholas Asher ; Farah Benamara ; Alex Lascarides
Abstract: This paper describes a method that predicts which trades players execute during a winlose game. Our method uses data collected from chat negotiations of the game The Settlers of Catan and exploits the conversation to construct dynamically a partial model of each player’s preferences. This in turn yields equilibrium trading moves via principles from game theory. We compare our method against four baselines and show that tracking how preferences evolve through the dialogue and reasoning about equilibrium moves are both crucial to success.
5 0.44942081 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs
Author: Ikumi Suzuki ; Kazuo Hara ; Masashi Shimbo ; Marco Saerens ; Kenji Fukumizu
Abstract: The performance of nearest neighbor methods is degraded by the presence of hubs, i.e., objects in the dataset that are similar to many other objects. In this paper, we show that the classical method of centering, the transformation that shifts the origin of the space to the data centroid, provides an effective way to reduce hubs. We show analytically why hubs emerge and why they are suppressed by centering, under a simple probabilistic model of data. To further reduce hubs, we also move the origin more aggressively towards hubs, through weighted centering. Our experimental results show that (weighted) centering is effective for natural language data; it improves the performance of the k-nearest neighbor classi- fiers considerably in word sense disambiguation and document classification tasks.
6 0.40779015 78 emnlp-2013-Exploiting Language Models for Visual Recognition
8 0.39354458 203 emnlp-2013-With Blinkers on: Robust Prediction of Eye Movements across Readers
9 0.36914909 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems
11 0.34072641 155 emnlp-2013-Question Difficulty Estimation in Community Question Answering Services
12 0.31491795 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation
13 0.30060992 6 emnlp-2013-A Generative Joint, Additive, Sequential Model of Topics and Speech Acts in Patient-Doctor Communication
14 0.29855418 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries
16 0.2764205 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
18 0.25915268 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions
19 0.25903437 192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes
20 0.25352466 116 emnlp-2013-Joint Parsing and Disfluency Detection in Linear Time
topicId topicWeight
[(3, 0.064), (18, 0.02), (22, 0.032), (30, 0.049), (45, 0.012), (47, 0.014), (50, 0.036), (51, 0.158), (53, 0.013), (66, 0.03), (71, 0.023), (75, 0.029), (90, 0.014), (94, 0.375), (96, 0.027), (97, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.76614231 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior
Author: Nikos Engonopoulos ; Martin Villalba ; Ivan Titov ; Alexander Koller
Abstract: We present a statistical model for predicting how the user of an interactive, situated NLP system resolved a referring expression. The model makes an initial prediction based on the meaning of the utterance, and revises it continuously based on the user’s behavior. The combined model outperforms its components in predicting reference resolution and when to give feedback.
2 0.45440239 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation
Author: Rui Fang ; Changsong Liu ; Lanbo She ; Joyce Y. Chai
Abstract: In situated dialogue, humans and agents have mismatched capabilities of perceiving the shared environment. Their representations of the shared world are misaligned. Thus referring expression generation (REG) will need to take this discrepancy into consideration. To address this issue, we developed a hypergraph-based approach to account for group-based spatial relations and uncertainties in perceiving the environment. Our empirical results have shown that this approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). This big performance gap calls for new solutions to REG that can mediate a shared perceptual basis in situated dialogue.
Author: Andrew J. Anderson ; Elia Bruni ; Ulisse Bordignon ; Massimo Poesio ; Marco Baroni
Abstract: Traditional distributional semantic models extract word meaning representations from cooccurrence patterns of words in text corpora. Recently, the distributional approach has been extended to models that record the cooccurrence of words with visual features in image collections. These image-based models should be complementary to text-based ones, providing a more cognitively plausible view of meaning grounded in visual perception. In this study, we test whether image-based models capture the semantic patterns that emerge from fMRI recordings of the neural signal. Our results indicate that, indeed, there is a significant correlation between image-based and brain-based semantic similarities, and that image-based models complement text-based ones, so that the best correlations are achieved when the two modalities are combined. Despite some unsatisfactory, but explained out- comes (in particular, failure to detect differential association of models with brain areas), the results show, on the one hand, that imagebased distributional semantic models can be a precious new tool to explore semantic representation in the brain, and, on the other, that neural data can be used as the ultimate test set to validate artificial semantic models in terms of their cognitive plausibility.
4 0.43167394 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity
Author: Eduardo Blanco ; Dan Moldovan
Abstract: This paper presents a novel approach to determine textual similarity. A layered methodology to transform text into logic forms is proposed, and semantic features are derived from a logic prover. Experimental results show that incorporating the semantic structure of sentences is beneficial. When training data is unavailable, scores obtained from the logic prover in an unsupervised manner outperform supervised methods.
5 0.4308106 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology
Author: Yiping Jin ; Min-Yen Kan ; Jun-Ping Ng ; Xiangnan He
Abstract: This paper presents DefMiner, a supervised sequence labeling system that identifies scientific terms and their accompanying definitions. DefMiner achieves 85% F1 on a Wikipedia benchmark corpus, significantly improving the previous state-of-the-art by 8%. We exploit DefMiner to process the ACL Anthology Reference Corpus (ARC) – a large, real-world digital library of scientific articles in computational linguistics. The resulting automatically-acquired glossary represents the terminology defined over several thousand individual research articles. We highlight several interesting observations: more definitions are introduced for conference and workshop papers over the years and that multiword terms account for slightly less than half of all terms. Obtaining a list of popular , defined terms in a corpus ofcomputational linguistics papers, we find that concepts can often be categorized into one of three categories: resources, methodologies and evaluation metrics.
6 0.42920285 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
7 0.42764729 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
8 0.4276275 152 emnlp-2013-Predicting the Presence of Discourse Connectives
9 0.42676845 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
10 0.42668498 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
11 0.42665541 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
12 0.42569157 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
13 0.42544979 86 emnlp-2013-Feature Noising for Log-Linear Structured Prediction
14 0.42490855 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
15 0.42490026 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery
16 0.42484581 26 emnlp-2013-Assembling the Kazakh Language Corpus
17 0.42451257 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
18 0.4242174 73 emnlp-2013-Error-Driven Analysis of Challenges in Coreference Resolution
19 0.42419592 36 emnlp-2013-Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach
20 0.42392591 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation