emnlp emnlp2013 emnlp2013-185 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Rui Fang ; Changsong Liu ; Lanbo She ; Joyce Y. Chai
Abstract: In situated dialogue, humans and agents have mismatched capabilities of perceiving the shared environment. Their representations of the shared world are misaligned. Thus referring expression generation (REG) will need to take this discrepancy into consideration. To address this issue, we developed a hypergraph-based approach to account for group-based spatial relations and uncertainties in perceiving the environment. Our empirical results have shown that this approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). This big performance gap calls for new solutions to REG that can mediate a shared perceptual basis in situated dialogue.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract In situated dialogue, humans and agents have mismatched capabilities of perceiving the shared environment. [sent-4, score-0.745]
2 Thus referring expression generation (REG) will need to take this discrepancy into consideration. [sent-6, score-0.376]
3 To address this issue, we developed a hypergraph-based approach to account for group-based spatial relations and uncertainties in perceiving the environment. [sent-7, score-0.336]
4 However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e. [sent-9, score-0.433]
5 , 84%), they perform rather poorly when the agent has imperfect perception of the environment (e. [sent-11, score-0.498]
6 This big performance gap calls for new solutions to REG that can mediate a shared perceptual basis in situated dialogue. [sent-14, score-0.724]
7 1 Introduction Situated human robot dialogue has received increasing attention in recent years. [sent-15, score-0.219]
8 In situated dialogue, robots/artificial agents and their human partners are co-present in a shared physical world. [sent-16, score-0.485]
9 Due to its limited perceptual and reasoning capabilities, the robot’s representation of the shared world is often incomplete, error-prone, and significantly mismatched from that of its human partner’s. [sent-18, score-0.673]
10 Although physically co-present, a joint perceptual basis between the human and the robot cannot be established (Clark and Brennan, 1991). [sent-19, score-0.512]
11 392 Thus, referential communication between the human and the robot becomes difficult. [sent-20, score-0.329]
12 How this mismatched perceptual basis affects referential communication in situated dialogue was investigated in our previous work (Liu et al. [sent-21, score-1.191]
13 In that work, the main focus is on reference resolution: given referential descriptions from human part- ners, how to identify referents in the environment even though the robot only has imperfect perception of the environment. [sent-23, score-0.674]
14 Since robots need to collaborate with human partners to establish a joint perceptual basis, referring expression generation (REG) becomes an equally important problem in situated dialogue. [sent-24, score-1.162]
15 Robots have much lower perceptual capabilities of the environment than humans. [sent-25, score-0.526]
16 How can a robot effectively generate referential descriptions about the environment so that its human partner can understand which objects are being referred to? [sent-26, score-0.693]
17 There has been a tremendous amount of work on referring expression generation in the last two decades (Dale, 1995; Krahmer and Deemter, 2012). [sent-27, score-0.376]
18 However, this assumption no longer holds in situated dialogue with robots. [sent-33, score-0.336]
19 First, the perfect knowledge of the environment is not available to the agent ahead oftime. [sent-35, score-0.283]
20 The agent needs to automatically make inferences to connect recognized lower-level visual features with ProceSe datintlges, o Wfa tsh ein 2g01to3n, C UoSnfAe,re 1n8c-e2 o1n O Ecmtopbier ic 2a0l1 M3. [sent-36, score-0.308]
21 Second, in situated dialogue the agent and the human have mismatched representations of the environment. [sent-40, score-0.753]
22 Given these two distinctions, it is not clear whether state-of-the-art REG approaches are applicable under mismatched perceptual basis in situated dialogue. [sent-42, score-0.877]
23 To address this issue, this paper revisits the problem of REG in the context of mismatched perceptual basis. [sent-43, score-0.582]
24 We fur- ther extended regular graph representation into hypergraph representation to account for group-based spatial relations that are important for visual descriptions (Dhande, 2003; Tenbrink and Moratz, 2003; Funakoshi et al. [sent-48, score-0.647]
25 However, while our approache performs effectively when the agent has perfect knowledge or perception of the environment (e. [sent-52, score-0.433]
26 , 84%), it performs poorly under the mismatched perceptual basis (e. [sent-54, score-0.643]
27 This performance gap calls for new solutions for REG that are capable of mediating mismatched perceptual basis. [sent-57, score-0.638]
28 In the following sections, we first describe our hypergraph-based representations and illustrate how uncertainties from automated perception can be incorporated. [sent-58, score-0.328]
29 We then describe an empirical study using Amazon Mechanical Turks for evaluating generated referring expressions. [sent-59, score-0.225]
30 Most of these ap393 proaches assume the agent has access to a complete symbolic representation of the domain. [sent-63, score-0.285]
31 Recently, there has been increasing interest in REG for visual objects (Roy, 2002; Golland et al. [sent-65, score-0.308]
32 , 2010) uses visual scenes that are generated by computer graphics and thus the internal representation of the scene is known. [sent-69, score-0.341]
33 Some other work focuses on the connection between lower-level visual fea- tures and symbolic descriptors for REG (Roy, 2002; Mitchell et al. [sent-70, score-0.266]
34 It is well established that automated recognition of visual scenes is extremely challenging. [sent-73, score-0.186]
35 It is not clear whether the existing approaches can be extended to the situation where the agent has imperfect perception of the shared environment. [sent-75, score-0.442]
36 An earlier work by Horacek (Horacek, 2005) has looked into the problem of mismatched knowledge between conversation partners for REG. [sent-76, score-0.406]
37 We are interested in REG under mismatched perceptual basis between conversation partners, where the agent has imperfect perception and knowledge of the shared environment. [sent-81, score-1.133]
38 , 2003) and extended it to incorporate group spatial relations and uncertainties associated with automated perception of the environment. [sent-83, score-0.578]
39 scene Figure 1: An original scene and its impoverished scene processed by CV algorithm 3 Hypergraph-based REG Towards mediating a shared perceptual basis in situated dialogue, our previous work (Liu et al. [sent-86, score-1.339]
40 , 2012) has conducted experiments to study referential communication between partners with mismatched perceptual capabilities. [sent-87, score-0.904]
41 We simulated mismatched capabilities by making an original scene (Figure 1(a)) available to a director (simulating higher perceptual calibre) and a corresponding impoverished scene (Figure 1(b)) available to a matcher (simulating lowered perceptual calibre). [sent-88, score-1.553]
42 The impoverished scene is created by re-rendering automated recognition results of the original scene by a CV algorithm. [sent-89, score-0.474]
43 An example of the original scene and an impoverished scene is shown in Figure 1. [sent-90, score-0.427]
44 Using this setup, the director and the matcher were instructed to collaborate with each other on some naming games. [sent-91, score-0.191]
45 Through these games, they collected data on how partners with mismatched perceptual capabilities collaborate to ground their referential communication. [sent-92, score-0.994]
46 , 2012) is intended to simulate situated dialogue between a human (like the director) and a robot (like the matcher). [sent-94, score-0.453]
47 The robot has a significantly lowered ability in perception and reasoning. [sent-95, score-0.267]
48 The robot’s internal representation of the shared world will be much like the impoverished scene which contains many recognition errors. [sent-96, score-0.349]
49 , 2013) shows that different strategies were used by conversation partners to produce referential descriptions. [sent-99, score-0.323]
50 Binary spatial relationships sometimes are difficult to describe the target object, so the matcher must resort to group information to distinguish the target object from the rest of the objects. [sent-104, score-0.378]
51 For example, suppose the matcher needs to describe the target object 5 in Figure 1(b), he/she may have to start by indicating the group of three objects at the bottom and then specify the relationship (i. [sent-105, score-0.393]
52 While the original graphbased approach can effectively represent attributes and binary relations between objects (Krahmer et al. [sent-111, score-0.26]
53 Therefore, to address the low perceptual capabilities ofartificial agents, we introduce hypergraphs to represent the shared environment. [sent-113, score-0.544]
54 Our approach has two unique characteristics compared to previous graph-based approaches: (1) A hypergraph representation is more general than a regular graph. [sent-114, score-0.207]
55 (2) Unlike previous work, here the generation of hypergraphs are completely driven by automated perception of the environment. [sent-116, score-0.366]
56 This is done by incorporating uncertainties in perception and reasoning into cost functions associated with graphs. [sent-117, score-0.281]
57 While regular graphs are commonly used to represent binary relations between two nodes, hypergraphs provide a more general representation for n-ary relations among multiple nodes. [sent-125, score-0.254]
58 We use hypergraphs to represent the agent’s perceived physical environment (also called scene hypergraphs). [sent-126, score-0.452]
59 , color, size, type information) or a group of objects (e. [sent-130, score-0.214]
60 Hyperarcs are also used to capture the spatial relations between any two subsets of nodes, whether it is a relation between two objects, or between two groups of objects, or between one or more objects within a group of objects. [sent-133, score-0.419]
61 For example, Figure 2 shows a hypergraph created for part of the impoverished scene shown in Figure 1(b) (i. [sent-134, score-0.387]
62 One important characteristic is that, because the graph is created based on an automated vision recognition system, the values of an attribute or a relation in the hypergraph are numeric (except for the type attribute). [sent-137, score-0.279]
63 For example, the value of the color attribute is the RGB distribution extracted from the corresponding visual object, the value of the size attribute is the width and height of the bounding box and the value of the location attribute is a function of spatial coordinates. [sent-138, score-0.611]
64 2 Hypergraph Pruning The perceived visual scene can be represented as a complete hypergraph, in which any pair of two sub- sets of nodes are connected by a hyperarc. [sent-143, score-0.403]
65 , hyperarcs), we only retain those relations that are likely used by humans to produce referring expressions, based on two heuristics. [sent-147, score-0.32]
66 The first heuristic is based on perceptual principles, also called the Gestalt Laws of perception (Sternberg, 2003), which describe how people group visually similar objects into entities or groups. [sent-148, score-0.698]
67 Two well known principles of perceptual grouping are proximity and similarity (Wertheimer, 1938): objects that lie close together are often perceived as groups; objects of similar shape, size or color are more likely to form groups than objects differing along these dimensions. [sent-149, score-1.1]
68 Based on these two principles, previous works have developed different algorithms for perceptual grouping (Thrisson, 1994; Gatt, 2006). [sent-150, score-0.375]
69 Given the results from spatial grouping, we only retain hyperarcs that represent spatial relations between two objects, between two perceived groups, between one object and a perceived group, or between one object and the group it belongs to. [sent-152, score-0.841]
70 For example, when referring to the stapler (object 9 in Figure 1(a) ), it is more likely to use “the stapler above the battery” than “the stapler above the cellphone”. [sent-155, score-0.393]
71 Based on this observation, we prune the hypergraphs by only retaining hyperarcs between an object and their closest relata for each possible orientation. [sent-156, score-0.235]
72 Figure 2 shows the resulting hypergraph for representing a subset of objects (7, 8, 9, 11, and 13) in Figure 1(a). [sent-157, score-0.298]
73 Figure 2: An example ofhypergraph representing the perceived scene (a partial scene only including object 7, 8, 9, 11, 13 for Figure 1(a)). [sent-158, score-0.519]
74 3 Symbolic Descriptors for Attributes As mentioned earlier, the values of attributes of objects and their relations are numerical in nature. [sent-160, score-0.26]
75 In order for the agent to generate natural language de- scriptions, the first step is to assign symbolic labels or descriptors to those attributes and relations. [sent-161, score-0.336]
76 1 Lexicon with Grounded Semantics Grounded semantics provides a bridge to connect symbolic labels or words with lower level visual features (Harnad, 1990). [sent-165, score-0.222]
77 , identifying visual objects in the environment given language descriptions (Dhande, 2003; Gorniak and Roy, 2004; Tenbrink and Moratz, 2003; Siebert and Schlangen, 2008; Liu et al. [sent-168, score-0.485]
78 For the referring expression generation task here, we also need a lexicon with grounded semantics. [sent-170, score-0.442]
79 For the spatial relation terms such as above, below, left, right, the semantic grounding functions take both vertical and horizontal coordinates of two objects, as follows 1: spatialRel : above(a, b) = fabove( v~aloc, v~ bloc) =? [sent-178, score-0.199]
80 This seems to indicate that exploring extra effort in REG could help mediate mismatched perceptions in situated dialogue. [sent-182, score-0.56]
81 5 Conclusion In situated dialogue, humans and agents have mismatched perceptions of the shared environment. [sent-184, score-0.708]
82 To facilitate successful referential communication between a human and an agent, the agent needs to take such discrepancies into consideration and generate referential descriptions that can be understood by its human partner. [sent-185, score-0.609]
83 With this in mind, we re-visited the problem of referring expression generation in the 400 context of mismatched perceptions between humans and agents. [sent-186, score-0.709]
84 Our empirical results have shown that, to address the agent’s limited perceptual capability, REG algorithms will need to take into account the uncertainties in perception and reasoning. [sent-189, score-0.615]
85 Group-based information appears more reliable and thus should be modeled by an approach that deals with automated perception of spatially rich scenes. [sent-190, score-0.197]
86 While graph-based approaches have shown effective for the situation where the agent has complete knowledge of the environment, as its human partner, these approaches are often inadequate when humans and agents have mismatched representations of the shared world. [sent-191, score-0.602]
87 Our empirical results here call for new solutions to address the mismatched perceptual basis. [sent-192, score-0.582]
88 Previous work indicated that referential communication is a collaborative process (Clark and Wilkes-Gibbs, 1986; Heeman and Hirst, 1995). [sent-193, score-0.25]
89 For the situation with mis- matched perceptual basis, a potential solution thus should go beyond the objective of generating a minimum description, and towards a collaborative model which incorporates immediate feedback from the conversation partner (Edmonds, 1994). [sent-195, score-0.485]
90 A conceptual graph approach to the generation of referring expressions. [sent-223, score-0.32]
91 Computational interpretations of the gricean maxims in the generation of referring expressions. [sent-236, score-0.32]
92 A computational model to connect gestalt perception and natural language. [sent-240, score-0.206]
93 Generation of relative referring expressions based on perceptual grouping. [sent-250, score-0.606]
94 Attribute selection for referring expression generation: new algorithms and evaluation methods. [sent-262, score-0.281]
95 Evaluating algorithms for the generation of referring expressions using a balanced corpus. [sent-267, score-0.367]
96 Incremental generation of spatial referring expressions in situated dialog. [sent-309, score-0.755]
97 A computational model for color naming and describing color composition of images. [sent-346, score-0.246]
98 A simple method for resolution of definite reference in a shared visual context. [sent-358, score-0.197]
99 Does size matter – how much 402 data is required to train a reg algorithm? [sent-380, score-0.198]
100 The use of spatial relations in referring expression generation. [sent-390, score-0.486]
wordName wordTfidf (topN-words)
[('perceptual', 0.334), ('mismatched', 0.248), ('situated', 0.234), ('referring', 0.225), ('reg', 0.198), ('objects', 0.169), ('scene', 0.169), ('agent', 0.169), ('referential', 0.165), ('krahmer', 0.165), ('spatial', 0.154), ('perception', 0.15), ('visual', 0.139), ('gatt', 0.133), ('uncertainties', 0.131), ('hypergraph', 0.129), ('color', 0.123), ('robot', 0.117), ('environment', 0.114), ('vcolor', 0.112), ('partners', 0.11), ('dialogue', 0.102), ('generation', 0.095), ('perceived', 0.095), ('emiel', 0.093), ('funakoshi', 0.093), ('kees', 0.093), ('matcher', 0.093), ('impoverished', 0.089), ('object', 0.086), ('symbolic', 0.083), ('agents', 0.083), ('deemter', 0.081), ('capabilities', 0.078), ('stroudsburg', 0.078), ('hyperarcs', 0.075), ('hypergraphs', 0.074), ('grounded', 0.066), ('imperfect', 0.065), ('attribute', 0.065), ('partner', 0.065), ('viethen', 0.065), ('enlg', 0.065), ('descriptions', 0.063), ('basis', 0.061), ('golland', 0.059), ('collaborate', 0.059), ('inlg', 0.059), ('shared', 0.058), ('dale', 0.056), ('changsong', 0.056), ('gestalt', 0.056), ('jette', 0.056), ('joyce', 0.056), ('mediating', 0.056), ('relatum', 0.056), ('stapler', 0.056), ('tenbrink', 0.056), ('expression', 0.056), ('pa', 0.054), ('relations', 0.051), ('albert', 0.049), ('robots', 0.049), ('anja', 0.049), ('rgb', 0.049), ('conversation', 0.048), ('expressions', 0.047), ('automated', 0.047), ('communication', 0.047), ('regular', 0.045), ('grounding', 0.045), ('group', 0.045), ('liu', 0.045), ('descriptors', 0.044), ('theune', 0.044), ('humans', 0.044), ('van', 0.042), ('perceptions', 0.041), ('grouping', 0.041), ('attributes', 0.04), ('director', 0.039), ('vision', 0.038), ('collaborative', 0.038), ('calibre', 0.037), ('croitoru', 0.037), ('dhande', 0.037), ('gallo', 0.037), ('gorniak', 0.037), ('kotaro', 0.037), ('lanbo', 0.037), ('mediate', 0.037), ('moratz', 0.037), ('satoru', 0.037), ('siebert', 0.037), ('takenobu', 0.037), ('rui', 0.037), ('roy', 0.037), ('cv', 0.035), ('fang', 0.035), ('representation', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation
Author: Rui Fang ; Changsong Liu ; Lanbo She ; Joyce Y. Chai
Abstract: In situated dialogue, humans and agents have mismatched capabilities of perceiving the shared environment. Their representations of the shared world are misaligned. Thus referring expression generation (REG) will need to take this discrepancy into consideration. To address this issue, we developed a hypergraph-based approach to account for group-based spatial relations and uncertainties in perceiving the environment. Our empirical results have shown that this approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). This big performance gap calls for new solutions to REG that can mediate a shared perceptual basis in situated dialogue.
2 0.28555974 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation
Author: Nicholas FitzGerald ; Yoav Artzi ; Luke Zettlemoyer
Abstract: We present a new approach to referring expression generation, casting it as a density estimation problem where the goal is to learn distributions over logical expressions identifying sets of objects in the world. Despite an extremely large space of possible expressions, we demonstrate effective learning of a globally normalized log-linear distribution. This learning is enabled by a new, multi-stage approximate inference technique that uses a pruning model to construct only the most likely logical forms. We train and evaluate the approach on a new corpus of references to sets of visual objects. Experiments show the approach is able to learn accurate models, which generate over 87% of the expressions people used. Additionally, on the previously studied special case of single object reference, we show a 35% relative error reduction over previous state of the art.
3 0.15272862 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior
Author: Nikos Engonopoulos ; Martin Villalba ; Ivan Titov ; Alexander Koller
Abstract: We present a statistical model for predicting how the user of an interactive, situated NLP system resolved a referring expression. The model makes an initial prediction based on the meaning of the utterance, and revises it continuously based on the user’s behavior. The combined model outperforms its components in predicting reference resolution and when to give feedback.
4 0.15057139 78 emnlp-2013-Exploiting Language Models for Visual Recognition
Author: Dieu-Thu Le ; Jasper Uijlings ; Raffaella Bernardi
Abstract: The problem of learning language models from large text corpora has been widely studied within the computational linguistic community. However, little is known about the performance of these language models when applied to the computer vision domain. In this work, we compare representative models: a window-based model, a topic model, a distributional memory and a commonsense knowledge database, ConceptNet, in two visual recognition scenarios: human action recognition and object prediction. We examine whether the knowledge extracted from texts through these models are compatible to the knowledge represented in images. We determine the usefulness of different language models in aiding the two visual recognition tasks. The study shows that the language models built from general text corpora can be used instead of expensive annotated images and even outperform the image model when testing on a big general dataset.
5 0.10717177 98 emnlp-2013-Image Description using Visual Dependency Representations
Author: Desmond Elliott ; Frank Keller
Abstract: Describing the main event of an image involves identifying the objects depicted and predicting the relationships between them. Previous approaches have represented images as unstructured bags of regions, which makes it difficult to accurately predict meaningful relationships between regions. In this paper, we introduce visual dependency representations to capture the relationships between the objects in an image, and hypothesize that this representation can improve image description. We test this hypothesis using a new data set of region-annotated images, associated with visual dependency representations and gold-standard descriptions. We describe two template-based description generation models that operate over visual dependency representations. In an image descrip- tion task, we find that these models outperform approaches that rely on object proximity or corpus information to generate descriptions on both automatic measures and on human judgements.
7 0.10280869 11 emnlp-2013-A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities
8 0.077502087 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs
9 0.062403094 91 emnlp-2013-Grounding Strategic Conversation: Using Negotiation Dialogues to Predict Trades in a Win-Lose Game
10 0.060178697 141 emnlp-2013-Online Learning for Inexact Hypergraph Search
11 0.050661225 192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes
12 0.048195373 145 emnlp-2013-Optimal Beam Search for Machine Translation
13 0.047157053 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
14 0.043208361 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter
15 0.042737912 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
16 0.040520664 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
17 0.039585292 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation
18 0.03752616 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability
20 0.035528038 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations
topicId topicWeight
[(0, -0.141), (1, 0.045), (2, -0.029), (3, 0.083), (4, -0.042), (5, 0.188), (6, -0.054), (7, -0.063), (8, -0.046), (9, 0.043), (10, -0.37), (11, -0.076), (12, 0.138), (13, 0.026), (14, -0.068), (15, -0.033), (16, -0.027), (17, -0.023), (18, -0.015), (19, 0.032), (20, 0.08), (21, -0.04), (22, 0.013), (23, -0.029), (24, -0.054), (25, -0.121), (26, 0.062), (27, -0.172), (28, -0.066), (29, 0.155), (30, 0.009), (31, -0.053), (32, -0.005), (33, -0.013), (34, 0.112), (35, -0.066), (36, -0.087), (37, -0.098), (38, -0.088), (39, -0.101), (40, 0.086), (41, 0.149), (42, 0.079), (43, 0.018), (44, 0.067), (45, -0.13), (46, -0.012), (47, 0.057), (48, 0.127), (49, -0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.96583217 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation
Author: Rui Fang ; Changsong Liu ; Lanbo She ; Joyce Y. Chai
Abstract: In situated dialogue, humans and agents have mismatched capabilities of perceiving the shared environment. Their representations of the shared world are misaligned. Thus referring expression generation (REG) will need to take this discrepancy into consideration. To address this issue, we developed a hypergraph-based approach to account for group-based spatial relations and uncertainties in perceiving the environment. Our empirical results have shown that this approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). This big performance gap calls for new solutions to REG that can mediate a shared perceptual basis in situated dialogue.
2 0.76943099 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior
Author: Nikos Engonopoulos ; Martin Villalba ; Ivan Titov ; Alexander Koller
Abstract: We present a statistical model for predicting how the user of an interactive, situated NLP system resolved a referring expression. The model makes an initial prediction based on the meaning of the utterance, and revises it continuously based on the user’s behavior. The combined model outperforms its components in predicting reference resolution and when to give feedback.
3 0.71727157 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation
Author: Nicholas FitzGerald ; Yoav Artzi ; Luke Zettlemoyer
Abstract: We present a new approach to referring expression generation, casting it as a density estimation problem where the goal is to learn distributions over logical expressions identifying sets of objects in the world. Despite an extremely large space of possible expressions, we demonstrate effective learning of a globally normalized log-linear distribution. This learning is enabled by a new, multi-stage approximate inference technique that uses a pruning model to construct only the most likely logical forms. We train and evaluate the approach on a new corpus of references to sets of visual objects. Experiments show the approach is able to learn accurate models, which generate over 87% of the expressions people used. Additionally, on the previously studied special case of single object reference, we show a 35% relative error reduction over previous state of the art.
4 0.56631637 78 emnlp-2013-Exploiting Language Models for Visual Recognition
Author: Dieu-Thu Le ; Jasper Uijlings ; Raffaella Bernardi
Abstract: The problem of learning language models from large text corpora has been widely studied within the computational linguistic community. However, little is known about the performance of these language models when applied to the computer vision domain. In this work, we compare representative models: a window-based model, a topic model, a distributional memory and a commonsense knowledge database, ConceptNet, in two visual recognition scenarios: human action recognition and object prediction. We examine whether the knowledge extracted from texts through these models are compatible to the knowledge represented in images. We determine the usefulness of different language models in aiding the two visual recognition tasks. The study shows that the language models built from general text corpora can be used instead of expensive annotated images and even outperform the image model when testing on a big general dataset.
5 0.54268342 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs
Author: Ikumi Suzuki ; Kazuo Hara ; Masashi Shimbo ; Marco Saerens ; Kenji Fukumizu
Abstract: The performance of nearest neighbor methods is degraded by the presence of hubs, i.e., objects in the dataset that are similar to many other objects. In this paper, we show that the classical method of centering, the transformation that shifts the origin of the space to the data centroid, provides an effective way to reduce hubs. We show analytically why hubs emerge and why they are suppressed by centering, under a simple probabilistic model of data. To further reduce hubs, we also move the origin more aggressively towards hubs, through weighted centering. Our experimental results show that (weighted) centering is effective for natural language data; it improves the performance of the k-nearest neighbor classi- fiers considerably in word sense disambiguation and document classification tasks.
6 0.36410278 98 emnlp-2013-Image Description using Visual Dependency Representations
8 0.33237717 91 emnlp-2013-Grounding Strategic Conversation: Using Negotiation Dialogues to Predict Trades in a Win-Lose Game
9 0.32093287 11 emnlp-2013-A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities
10 0.30081302 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions
11 0.23457146 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching
12 0.23115225 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation
14 0.22891995 192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes
15 0.21605079 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
16 0.21213211 203 emnlp-2013-With Blinkers on: Robust Prediction of Eye Movements across Readers
17 0.1891522 201 emnlp-2013-What is Hidden among Translation Rules
19 0.1768558 145 emnlp-2013-Optimal Beam Search for Machine Translation
20 0.17099965 191 emnlp-2013-Understanding and Quantifying Creativity in Lexical Composition
topicId topicWeight
[(3, 0.026), (8, 0.024), (18, 0.03), (22, 0.038), (26, 0.021), (30, 0.049), (50, 0.055), (51, 0.131), (53, 0.319), (66, 0.018), (71, 0.024), (75, 0.033), (90, 0.015), (94, 0.031), (96, 0.021), (97, 0.065)]
simIndex simValue paperId paperTitle
same-paper 1 0.7626245 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation
Author: Rui Fang ; Changsong Liu ; Lanbo She ; Joyce Y. Chai
Abstract: In situated dialogue, humans and agents have mismatched capabilities of perceiving the shared environment. Their representations of the shared world are misaligned. Thus referring expression generation (REG) will need to take this discrepancy into consideration. To address this issue, we developed a hypergraph-based approach to account for group-based spatial relations and uncertainties in perceiving the environment. Our empirical results have shown that this approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). This big performance gap calls for new solutions to REG that can mediate a shared perceptual basis in situated dialogue.
2 0.75796002 54 emnlp-2013-Decipherment with a Million Random Restarts
Author: Taylor Berg-Kirkpatrick ; Dan Klein
Abstract: This paper investigates the utility and effect of running numerous random restarts when using EM to attack decipherment problems. We find that simple decipherment models are able to crack homophonic substitution ciphers with high accuracy if a large number of random restarts are used but almost completely fail with only a few random restarts. For particularly difficult homophonic ciphers, we find that big gains in accuracy are to be had by running upwards of 100K random restarts, which we accomplish efficiently using a GPU-based parallel implementation. We run a series of experiments using millions of random restarts in order to investigate other empirical properties of decipherment problems, including the famously uncracked Zodiac 340.
3 0.72205615 129 emnlp-2013-Measuring Ideological Proportions in Political Speeches
Author: Yanchuan Sim ; Brice D. L. Acree ; Justin H. Gross ; Noah A. Smith
Abstract: We seek to measure political candidates’ ideological positioning from their speeches. To accomplish this, we infer ideological cues from a corpus of political writings annotated with known ideologies. We then represent the speeches of U.S. Presidential candidates as sequences of cues and lags (filler distinguished only by its length in words). We apply a domain-informed Bayesian HMM to infer the proportions of ideologies each candidate uses in each campaign. The results are validated against a set of preregistered, domain expertauthored hypotheses.
4 0.64714217 26 emnlp-2013-Assembling the Kazakh Language Corpus
Author: Olzhas Makhambetov ; Aibek Makazhanov ; Zhandos Yessenbayev ; Bakhyt Matkarimov ; Islam Sabyrgaliyev ; Anuar Sharafudinov
Abstract: This paper presents the Kazakh Language Corpus (KLC), which is one of the first attempts made within a local research community to assemble a Kazakh corpus. KLC is designed to be a large scale corpus containing over 135 million words and conveying five stylistic genres: literary, publicistic, official, scientific and informal. Along with its primary part KLC comprises such parts as: (i) annotated sub-corpus, containing segmented documents encoded in the eXtensible Markup Language (XML) that marks complete morphological, syntactic, and structural characteristics of texts; (ii) as well as a sub-corpus with the annotated speech data. KLC has a web-based corpus management system that helps to navigate the data and retrieve necessary information. KLC is also open for contributors, who are willing to make suggestions, donate texts and help with annotation of existing materials.
5 0.46353289 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation
Author: Nicholas FitzGerald ; Yoav Artzi ; Luke Zettlemoyer
Abstract: We present a new approach to referring expression generation, casting it as a density estimation problem where the goal is to learn distributions over logical expressions identifying sets of objects in the world. Despite an extremely large space of possible expressions, we demonstrate effective learning of a globally normalized log-linear distribution. This learning is enabled by a new, multi-stage approximate inference technique that uses a pruning model to construct only the most likely logical forms. We train and evaluate the approach on a new corpus of references to sets of visual objects. Experiments show the approach is able to learn accurate models, which generate over 87% of the expressions people used. Additionally, on the previously studied special case of single object reference, we show a 35% relative error reduction over previous state of the art.
6 0.46011028 23 emnlp-2013-Animacy Detection with Voting Models
7 0.45069906 10 emnlp-2013-A Multi-Teraflop Constituency Parser using GPUs
8 0.4390105 121 emnlp-2013-Learning Topics and Positions from Debatepedia
9 0.43746853 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior
11 0.43084034 98 emnlp-2013-Image Description using Visual Dependency Representations
12 0.42726046 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
13 0.42622754 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
14 0.42566156 137 emnlp-2013-Multi-Relational Latent Semantic Analysis
15 0.42491049 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
16 0.42450953 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
17 0.42431489 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors
18 0.42420825 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching
19 0.42368084 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
20 0.42367628 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction