acl acl2012 acl2012-166 knowledge-graph by maker-knowledge-mining

166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

Source: pdf

Author: Inderjeet Mani ; James Pustejovsky

Abstract: unkown-abstract

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Qualitative Modeling of Spatial Prepositions and Motion Expressions Inderjeet Mani Children창€™s Organization of Southeast Asia Thailand inder j eet . [sent-1, score-0.043]

2 com The ability to understand spatial prepositions and motion in natural language will enable a variety of new applications involving systems that can respond to verbal directions, map travel guides, display incident reports, etc. [sent-3, score-1.474]

3 , providing for enhanced information extraction, question-answering, information retrieval, and more principled text to scene rendering. [sent-4, score-0.115]

4 Until now, however, the semantics of spatial relations and motion verbs has been highly problematic. [sent-5, score-1.379]

5 This tutorial presents a new approach to the semantics of spatial descriptions and motion expressions based on linguistically interpreted qualitative reasoning. [sent-6, score-1.85]

6 Our approach allows for formal inference from spatial descriptions in natural language, while leveraging annotation schemes for time, space, and motion, along with machine learning from annotated corpora. [sent-7, score-0.806]

7 We introduce a compositional semantics for motion expressions that integrates spatial primitives drawn from qualitative calculi. [sent-8, score-1.842]

8 No previous exposure to the semantics of spatial prepositions or motion verbs is assumed. [sent-9, score-1.484]

9 The tutorial will sharpen cross-linguistic intuitions about the interpretation of spatial prepositions and motion constructions. [sent-10, score-1.4]

10 The attendees will also learn about qualitative reasoning schemes for static and dynamic spatial information, as well as three annotation schemes: TimeML, SpatialML, and ISO-Space, for time, space, and motion, respectively. [sent-11, score-1.183]

11 While both cognitive and formal linguistics have examined the meaning of motion verbs and spatial prepositions, these earlier approaches do not yield precise computable representations that are expressive enough for natural languages. [sent-12, score-1.421]

12 However, the previous literature makes it clear that communica1 James Pustejovsky Computer Science Department Brandeis University Waltham, MA USA j ame sp@ cs . [sent-13, score-0.051]

13 edu tion of motion relies on imprecise and highly abstract geometric descriptions, rather than Euclidean ones that specify the coordinates and shapes of every object. [sent-15, score-0.866]

14 This property makes these expressions a fit target for the field of qualitative spatial reasoning in AI, which has developed a rich set of geometric primitives for representing time, space (including distance, orientation, and topological relations), and motion. [sent-16, score-1.404]

15 The results of such research have yielded a wide variety of spatial and temporal reasoning logics and tools. [sent-17, score-0.736]

16 By reviewing these calculi and resources, this tutorial aims to systematically connect qualitative reasoning to natural language. [sent-18, score-0.809]

17 A qualitative model for static spatial descriptions and for path verbs; iv. [sent-24, score-1.034]

18 Semantics of spatial PPs mapped to qualitative spatial reasoning; ii. [sent-29, score-1.379]

19 Qualitative calculi for representing topological and orientation relations; iii. [sent-30, score-0.334]

20 DITL representations for manner-of-motion verbs and path verbs; iii. [sent-36, score-0.159]

21 Compositional semantics for motion expressions in DITL, with the spatial primitives drawn from qualitative calculi. [sent-37, score-1.762]

22 Route navigation, mapping travel narratives, QA, scene rendering from text, and generating event descriptions; ii. [sent-41, score-0.165]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('motion', 0.596), ('spatial', 0.54), ('qualitative', 0.299), ('calculi', 0.195), ('ditl', 0.147), ('primitives', 0.128), ('reasoning', 0.116), ('prepositions', 0.116), ('descriptions', 0.104), ('geometric', 0.103), ('verbs', 0.098), ('tutorial', 0.097), ('semantics', 0.095), ('expressions', 0.074), ('schemes', 0.069), ('scene', 0.069), ('pps', 0.069), ('topological', 0.062), ('travel', 0.062), ('static', 0.06), ('compositional', 0.058), ('orientation', 0.054), ('eet', 0.043), ('schedule', 0.043), ('attendees', 0.043), ('guides', 0.043), ('brandeis', 0.039), ('waltham', 0.039), ('incident', 0.039), ('exposure', 0.039), ('southeast', 0.039), ('imprecise', 0.039), ('euclidean', 0.036), ('coordinates', 0.036), ('computable', 0.036), ('reviewing', 0.036), ('shapes', 0.034), ('navigation', 0.034), ('inderjeet', 0.034), ('mani', 0.034), ('timeml', 0.034), ('rendering', 0.034), ('narratives', 0.034), ('thailand', 0.034), ('formal', 0.034), ('temporal', 0.034), ('respond', 0.033), ('route', 0.033), ('acsslo', 0.033), ('faogre', 0.033), ('jourliya', 0.033), ('ame', 0.033), ('intuitions', 0.033), ('annotation', 0.031), ('path', 0.031), ('jeju', 0.031), ('drawn', 0.03), ('representations', 0.03), ('interval', 0.029), ('pustejovsky', 0.029), ('relations', 0.028), ('leveraging', 0.028), ('expressive', 0.026), ('qa', 0.026), ('overview', 0.026), ('sp', 0.026), ('variety', 0.025), ('dynamic', 0.025), ('connect', 0.025), ('principled', 0.024), ('systematically', 0.024), ('display', 0.024), ('interpreted', 0.024), ('representing', 0.023), ('highly', 0.022), ('integrates', 0.022), ('enhanced', 0.022), ('gmai', 0.022), ('asia', 0.022), ('organization', 0.022), ('space', 0.022), ('yielded', 0.021), ('cognitive', 0.021), ('linguistically', 0.021), ('verbal', 0.02), ('republic', 0.02), ('precise', 0.02), ('logic', 0.02), ('examined', 0.02), ('involving', 0.019), ('specify', 0.019), ('fit', 0.019), ('property', 0.018), ('children', 0.018), ('interpretation', 0.018), ('literature', 0.018), ('ai', 0.017), ('relies', 0.017), ('directions', 0.017), ('aims', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

Author: Inderjeet Mani ; James Pustejovsky

Abstract: unkown-abstract

2 0.063958853 24 acl-2012-A Web-based Evaluation Framework for Spatial Instruction-Giving Systems

Author: Srinivasan Janarthanam ; Oliver Lemon ; Xingkun Liu

Abstract: We demonstrate a web-based environment for development and testing of different pedestrian route instruction-giving systems. The environment contains a City Model, a TTS interface, a game-world, and a user GUI including a simulated street-view. We describe the environment and components, the metrics that can be used for the evaluation of pedestrian route instruction-giving systems, and the shared challenge which is being organised using this environment.

3 0.062081218 69 acl-2012-Deep Learning for NLP (without Magic)

Author: Richard Socher ; Yoshua Bengio ; Christopher D. Manning

Abstract: unkown-abstract

4 0.054564327 191 acl-2012-Temporally Anchored Relation Extraction

Author: Guillermo Garrido ; Anselmo Penas ; Bernardo Cabaleiro ; Alvaro Rodrigo

Abstract: Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.

5 0.04831586 198 acl-2012-Topic Models, Latent Space Models, Sparse Coding, and All That: A Systematic Understanding of Probabilistic Semantic Extraction in Large Corpus

Author: Eric Xing

Abstract: Probabilistic topic models have recently gained much popularity in informational retrieval and related areas. Via such models, one can project high-dimensional objects such as text documents into a low dimensional space where their latent semantics are captured and modeled; can integrate multiple sources of information—to ”share statistical strength” among components of a hierarchical probabilistic model; and can structurally display and classify the otherwise unstructured object collections. However, to many practitioners, how topic models work, what to and not to expect from a topic model, how is it different from and related to classical matrix algebraic techniques such as LSI, NMF in NLP, how to empower topic models to deal with complex scenarios such as multimodal data, contractual text in social media, evolving corpus, or presence of supervision such as labeling and rating, how to make topic modeling computationally tractable even on webscale data, etc., in a principled way, remain unclear. In this tutorial, I will demystify the conceptual, mathematical, and computational issues behind all such problems surrounding the topic models and their applications by presenting a systematic overview of the mathematical foundation of topic modeling, and its connections to a number of related methods popular in other fields such as the LDA, admixture model, mixed membership model, latent space models, and sparse coding. Iwill offer a simple and unifying view of all these techniques under the framework multi-view latent space embedding, and online the roadmap of model extension and algorithmic design to3 ward different applications in IR and NLP. A main theme of this tutorial that tie together a wide range of issues and problems will build on the ”probabilistic graphical model” formalism, a formalism that exploits the conjoined talents of graph theory and probability theory to build complex models out of simpler pieces. Iwill use this formalism as a main aid to discuss both the mathematical underpinnings for the models and the related computational issues in a unified, simplistic, transparent, and actionable fashion. Jeju, Republic of Korea,T 8ut Jourliya 2l0 A1b2s.tr ?ac c2t0s1 o2f A ACssLo 2c0ia1t2io,n p faogre C 3o,mputational Linguistics

6 0.044273909 104 acl-2012-Graph-based Semi-Supervised Learning Algorithms for NLP

7 0.042722497 48 acl-2012-Classifying French Verbs Using French and English Lexical Resources

8 0.040848885 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations

9 0.040249128 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures

10 0.036240187 76 acl-2012-Distributional Semantics in Technicolor

11 0.034729633 51 acl-2012-Collective Generation of Natural Image Descriptions

12 0.034660712 93 acl-2012-Fast Online Lexicon Learning for Grounded Language Acquisition

13 0.033116896 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis

14 0.031584799 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT

15 0.031411286 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text

16 0.030178588 183 acl-2012-State-of-the-Art Kernels for Natural Language Processing

17 0.030122448 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction

18 0.029968953 126 acl-2012-Labeling Documents with Timestamps: Learning from their Time Expressions

19 0.026881289 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model

20 0.026435303 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.053), (1, 0.048), (2, -0.023), (3, 0.028), (4, 0.006), (5, -0.019), (6, -0.008), (7, 0.016), (8, -0.033), (9, -0.024), (10, -0.033), (11, 0.025), (12, 0.018), (13, 0.063), (14, -0.058), (15, 0.014), (16, -0.095), (17, 0.043), (18, -0.035), (19, 0.008), (20, -0.001), (21, 0.013), (22, -0.052), (23, 0.086), (24, 0.072), (25, 0.13), (26, 0.068), (27, -0.077), (28, 0.063), (29, 0.047), (30, -0.045), (31, -0.094), (32, 0.017), (33, -0.117), (34, -0.027), (35, 0.055), (36, 0.068), (37, 0.053), (38, 0.047), (39, -0.111), (40, 0.033), (41, -0.044), (42, -0.006), (43, 0.029), (44, -0.071), (45, -0.004), (46, 0.054), (47, 0.121), (48, 0.045), (49, 0.098)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98454416 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

Author: Inderjeet Mani ; James Pustejovsky

Abstract: unkown-abstract

2 0.56218427 69 acl-2012-Deep Learning for NLP (without Magic)

Author: Richard Socher ; Yoshua Bengio ; Christopher D. Manning

Abstract: unkown-abstract

3 0.51117009 104 acl-2012-Graph-based Semi-Supervised Learning Algorithms for NLP

Author: Amar Subramanya ; Partha Pratim Talukdar

Abstract: While labeled data is expensive to prepare, ever increasing amounts of unlabeled linguistic data are becoming widely available. In order to adapt to this phenomenon, several semi-supervised learning (SSL) algorithms, which learn from labeled as well as unlabeled data, have been developed. In a separate line of work, researchers have started to realize that graphs provide a natural way to represent data in a variety of domains. Graph-based SSL algorithms, which bring together these two lines of work, have been shown to outperform the state-ofthe-art in many applications in speech processing, computer vision and NLP. In particular, recent NLP research has successfully used graph-based SSL algorithms for PoS tagging (Subramanya et al., 2010), semantic parsing (Das and Smith, 2011), knowledge acquisition (Talukdar et al., 2008), sentiment analysis (Goldberg and Zhu, 2006) and text categoriza- tion (Subramanya and Bilmes, 2008). Recognizing this promising and emerging area of research, this tutorial focuses on graph-based SSL algorithms (e.g., label propagation methods). The tutorial is intended to be a sequel to the ACL 2008 SSL tutorial, focusing exclusively on graph-based SSL methods and recent advances in this area, which were beyond the scope of the previous tutorial. The tutorial is divided in two parts. In the first part, we will motivate the need for graph-based SSL methods, introduce some standard graph-based SSL algorithms, and discuss connections between these approaches. We will also discuss how linguistic data can be encoded as graphs and show how graph-based algorithms can be scaled to large amounts of data (e.g., web-scale data). Part 2 of the tutorial will focus on how graph-based methods can be used to solve several critical NLP tasks, including basic problems such as PoS tagging, semantic parsing, and more downstream tasks such as text categorization, information acquisition, and 6 Partha Pratim Talukdar Carnegie Mellon University ppt @ cs . cmu . edu sentiment analysis. We will conclude the tutorial with some exciting avenues for future work. Familiarity with semi-supervised learning and graph-based methods will not be assumed, and the necessary background will be provided. Examples from NLP tasks will be used throughout the tutorial to convey the necessary concepts. At the end of this tutorial, the attendee will walk away with the following: • An in-depth knowledge of the current state-oftAhen- ainrt-d dienp graph-based SeS oLf algorithms, taantde- tohfeability to implement them. • The ability to decide on the suitability of graph-based S toSL d meceitdheod osn nfo trh a problem. • Familiarity with different NLP tasks where graph-based wSSitLh m dieftfehorednst h NaLveP Pb teaesnk successfully applied. In addition to the above goals, we hope that this tutorial will better prepare the attendee to conduct exciting research at the intersection of NLP and other emerging areas with natural graph-structured data (e.g., Computation Social Science). Please visit http://graph-ssl.wikidot.com/ for details. References Dipanjan Das and Noah A. Smith. 2011. Semi-supervised frame-semantic parsing for unknown predicates. In Proceedings of the ACL: Human Language Technologies. Andrew B. Goldberg and Xiaojin Zhu. 2006. Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In Proceedings ofthe Workshop on Graph Based Methods for NLP. Amarnag Subramanya and Jeff Bilmes. 2008. Soft-supervised text classification. In EMNLP. Amarnag Subramanya, Slav Petrov, and Fernando Pereira. 2010. Graph-based semi-supervised learning of structured tagging models. In EMNLP. Partha Pratim Talukdar, Joseph Reisinger, Marius Pasca, Deepak Ravichandran, Rahul Bhagat, and Fernando Pereira. 2008. Weakly supervised acquisition of labeled class instances using graph random walks. In EMNLP. Jeju, Republic of Korea,T 8ut Jourliya 2l0 A1b2s.tr ?ac c2t0s1 o2f A ACssLo 2c0ia1t2io,n p faogre C 6o,mputational Linguistics

4 0.51116103 198 acl-2012-Topic Models, Latent Space Models, Sparse Coding, and All That: A Systematic Understanding of Probabilistic Semantic Extraction in Large Corpus

Author: Eric Xing

5 0.32775694 24 acl-2012-A Web-based Evaluation Framework for Spatial Instruction-Giving Systems

Author: Srinivasan Janarthanam ; Oliver Lemon ; Xingkun Liu

6 0.3179926 99 acl-2012-Finding Salient Dates for Building Thematic Timelines

7 0.31238449 76 acl-2012-Distributional Semantics in Technicolor

8 0.30797946 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

9 0.29366705 93 acl-2012-Fast Online Lexicon Learning for Grounded Language Acquisition

10 0.26915798 48 acl-2012-Classifying French Verbs Using French and English Lexical Resources

11 0.26412171 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis

12 0.26403904 186 acl-2012-Structuring E-Commerce Inventory

13 0.24604057 183 acl-2012-State-of-the-Art Kernels for Natural Language Processing

14 0.23431835 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text

15 0.2197331 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench

16 0.20276783 91 acl-2012-Extracting and modeling durations for habits and events from Twitter

17 0.20204213 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

18 0.2005813 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study

19 0.20056677 51 acl-2012-Collective Generation of Natural Image Descriptions

20 0.19771194 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.013), (26, 0.043), (28, 0.012), (37, 0.026), (39, 0.063), (59, 0.066), (74, 0.017), (79, 0.438), (82, 0.031), (84, 0.044), (85, 0.015), (90, 0.036), (92, 0.032), (99, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.88564861 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

Author: Inderjeet Mani ; James Pustejovsky

Abstract: unkown-abstract

2 0.44043225 163 acl-2012-Prediction of Learning Curves in Machine Translation

Author: Prasanth Kolachina ; Nicola Cancedda ; Marc Dymetman ; Sriram Venkatapathy

Abstract: Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a specific purpose. Since ad-hoc manual translation can represent a significant investment in time and money, a prior assesment of the amount of training data required to achieve a satisfactory accuracy level can be very useful. In this work, we show how to predict what the learning curve would look like if we were to manually translate increasing amounts of data. We consider two scenarios, 1) Monolingual samples in the source and target languages are available and 2) An additional small amount of parallel corpus is also available. We propose methods for predicting learning curves in both these scenarios.

3 0.25793609 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT

Author: Deyi Xiong ; Min Zhang ; Haizhou Li

Abstract: Predicate-argument structure contains rich semantic information of which statistical machine translation hasn’t taken full advantage. In this paper, we propose two discriminative, feature-based models to exploit predicateargument structures for statistical machine translation: 1) a predicate translation model and 2) an argument reordering model. The predicate translation model explores lexical and semantic contexts surrounding a verbal predicate to select desirable translations for the predicate. The argument reordering model automatically predicts the moving direction of an argument relative to its predicate after translation using semantic features. The two models are integrated into a state-of-theart phrase-based machine translation system and evaluated on Chinese-to-English transla- , tion tasks with large-scale training data. Experimental results demonstrate that the two models significantly improve translation accuracy.

4 0.25102541 59 acl-2012-Corpus-based Interpretation of Instructions in Virtual Environments

Author: Luciana Benotti ; Martin Villalba ; Tessa Lau ; Julian Cerruti

Abstract: Previous approaches to instruction interpretation have required either extensive domain adaptation or manually annotated corpora. This paper presents a novel approach to instruction interpretation that leverages a large amount of unannotated, easy-to-collect data from humans interacting with a virtual world. We compare several algorithms for automatically segmenting and discretizing this data into (utterance, reaction) pairs and training a classifier to predict reactions given the next utterance. Our empirical analysis shows that the best algorithm achieves 70% accuracy on this task, with no manual annotation required. 1 Introduction and motivation Mapping instructions into automatically executable actions would enable the creation of natural lan- , guage interfaces to many applications (Lau et al., 2009; Branavan et al., 2009; Orkin and Roy, 2009). In this paper, we focus on the task of navigation and manipulation of a virtual environment (Vogel and Jurafsky, 2010; Chen and Mooney, 2011). Current symbolic approaches to the problem are brittle to the natural language variation present in instructions and require intensive rule authoring to be fit for a new task (Dzikovska et al., 2008). Current statistical approaches require extensive manual annotations of the corpora used for training (MacMahon et al., 2006; Matuszek et al., 2010; Gorniak and Roy, 2007; Rieser and Lemon, 2010). Manual annotation and rule authoring by natural language engineering experts are bottlenecks for developing conversational systems for new domains. 181 t e s s al au @ us . ibm . com, j ce rrut i ar .ibm . com @ This paper proposes a fully automated approach to interpreting natural language instructions to complete a task in a virtual world based on unsupervised recordings of human-human interactions perform- ing that task in that virtual world. Given unannotated corpora collected from humans following other humans’ instructions, our system automatically segments the corpus into labeled training data for a classification algorithm. Our interpretation algorithm is based on the observation that similar instructions uttered in similar contexts should lead to similar actions being taken in the virtual world. Given a previously unseen instruction, our system outputs actions that can be directly executed in the virtual world, based on what humans did when given similar instructions in the past. 2 Corpora situated in virtual worlds Our environment consists of six virtual worlds designed for the natural language generation shared task known as the GIVE Challenge (Koller et al., 2010), where a pair of partners must collaborate to solve a task in a 3D space (Figure 1). The “instruction follower” (IF) can move around in the virtual world, but has no knowledge of the task. The “instruction giver” (IG) types instructions to the IF in order to guide him to accomplish the task. Each corpus contains the IF’s actions and position recorded every 200 milliseconds, as well as the IG’s instruc- tions with their timestamps. We used two corpora for our experiments. The Cm corpus (Gargett et al., 2010) contains instructions given by multiple people, consisting of 37 games spanning 2163 instructions over 8: 17 hs. The Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi1c 8s1–186, Figure 1: A screenshot of a virtual world. The world consists of interconnecting hallways, rooms and objects Cs corpus (Benotti and Denis, 2011), gathered using a single IG, is composed of 63 games and 3417 in- structions, and was recorded in a span of 6:09 hs. It took less than 15 hours to collect the corpora through the web and the subjects reported that the experiment was fun. While the environment is restricted, people describe the same route and the same objects in extremely different ways. Below are some examples of instructions from our corpus all given for the same route shown in Figure 1. 1) out 2) walk down the passage 3) nowgo [sic] to the pink room 4) back to the room with the plant 5) Go through the door on the left 6) go through opening with yellow wall paper People describe routes using landmarks (4) or specific actions (2). They may describe the same object differently (5 vs 6). Instructions also differ in their scope (3 vs 1). Thus, even ignoring spelling and grammatical errors, navigation instructions contain considerable variation which makes interpreting them a challenging problem. 3 Learning from previous interpretations Our algorithm consists of two phases: annotation and interpretation. Annotation is performed only once and consists of automatically associating each IG instruction to an IF reaction. Interpretation is performed every time the system receives an instruc182 tion and consists of predicting an appropriate reaction given reactions observed in the corpus. Our method is based on the assumption that a reaction captures the semantics of the instruction that caused it. Therefore, if two utterances result in the same reaction, they are paraphrases of each other, and similar utterances should generate the same reaction. This approach enables us to predict reactions for previously-unseen instructions. 3.1 Annotation phase The key challenge in learning from massive amounts of easily-collected data is to automatically annotate an unannotated corpus. Our annotation method consists of two parts: first, segmenting a low-level interaction trace into utterances and corresponding reactions, and second, discretizing those reactions into canonical action sequences. Segmentation enables our algorithm to learn from traces of IFs interacting directly with a virtual world. Since the IF can move freely in the virtual world, his actions are a stream of continuous behavior. Segmentation divides these traces into reactions that follow from each utterance of the IG. Consider the following example starting at the situation shown in Figure 1: IG(1): go through the yellow opening IF(2): [walks out of the room] IF(3): [turns left at the intersection] IF(4): [enters the room with the sofa] IG(5): stop It is not clear whether the IF is doing h3, 4i because h neo tis c reacting htoe r1 t or Fbec isadu soei hge h 3is, being proactive. While one could manually annotate this data to remove extraneous actions, our goal is to develop automated solutions that enable learning from massive amounts of data. We decided to approach this problem by experimenting with two alternative formal definitions: 1) a strict definition that considers the maximum reaction according to the IF behavior, and 2) a loose defini- tion based on the empirical observation that, in situated interaction, most instructions are constrained by the current visually perceived affordances (Gibson, 1979; Stoia et al., 2006). We formally define behavior segmentation (Bhv) as follows. A reaction rk to an instruction uk begins right after the instruction uk is uttered and ends right before the next instruction uk+1 is uttered. In the example, instruction 1corresponds to h2, 3, 4i . We formally d inefsitnrue visibility segmentation (Vis) as f Wolelows. A reaction rk to an instruction uk begins right after the instruction uk is uttered and ends right before the next instruction uk+1 is uttered or right after the IF leaves the area visible at 360◦ from where uk was uttered. In the example, instruction 1’s reaction would be limited to h2i because the intersection is nwootu vldisi bbele l ifmroimte dw htoer he2 tihe b eicnasutrsuec ttihoen was suetctetiroend. The Bhv and Vis methods define how to segment an interaction trace into utterances and their corresponding reactions. However, users frequently perform noisy behavior that is irrelevant to the goal of the task. For example, after hearing an instruction, an IF might go into the wrong room, realize the error, and leave the room. A reaction should not in- clude such irrelevant actions. In addition, IFs may accomplish the same goal using different behaviors: two different IFs may interpret “go to the pink room” by following different paths to the same destination. We would like to be able to generalize both reactions into one canonical reaction. As a result, our approach discretizes reactions into higher-level action sequences with less noise and less variation. Our discretization algorithm uses an automated planner and a planning representation of the task. This planning representation includes: (1) the task goal, (2) the actions which can be taken in the virtual world, and (3) the current state of the virtual world. Using the planning representation, the planner calculates an optimal path between the starting and ending states of the reaction, eliminating all unnecessary actions. While we use the classical planner FF (Hoffmann, 2003), our technique could also work with classical planning (Nau et al., 2004) or other techniques such as probabilistic planning (Bonet and Geffner, 2005). It is also not dependent on a particular discretization of the world in terms of actions. Now we are ready to define canonical reaction ck formally. Let Sk be the state of the virtual world when instruction uk was uttered, Sk+1 be the state of the world where the reaction ends (as defined by Bhv or Vis segmentation), and D be the planning domain representation of the virtual world. The canonical reaction to uk is defined as the sequence of actions 183 returned by the planner with Sk as initial state, Sk+1 as goal state and D as planning domain. 3.2 Interpretation phase The annotation phase results in a collection of (uk, ck) pairs. The interpretation phase uses these pairs to interpret new utterances in three steps. First, we filter the set of pairs into those whose reactions can be directly executed from the current IF position. Second, we group the filtered pairs according to their reactions. Third, we select the group with utterances most similar to the new utterance, and output that group’s reaction. Figure 2 shows the output of the first two steps: three groups of pairs whose reactions can all be executed from the IF’s current position. Figure 2: Utterance groups for this situation. Colored arrows show the reaction associated with each group. We treat the third step, selecting the most similar group for a new utterance, as a classification problem. We compare three different classification methods. One method uses nearest-neighbor classification with three different similarity metrics: Jaccard and Overlap coefficients (both of which measure the degree of overlap between two sets, differing only in the normalization of the final value (Nikravesh et al., 2005)), and Levenshtein Distance (a string met- ric for measuring the amount of differences between two sequences of words (Levenshtein, 1966)). Our second classification method employs a strategy in which we considered each group as a set of possible machine translations of our utterance, using the BLEU measure (Papineni et al., 2002) to select which group could be considered the best translation of our utterance. Finally, we trained an SVM classifier (Cortes and Vapnik, 1995) using the unigrams Corpus Cm Corpus Cs Algorithm Bhv Vis Bhv Vis Jaccard47%54%54%70% Overlap BLEU SVM Levenshtein 43% 44% 33% 21% 53% 52% 29% 20% 45% 54% 45% 8% 60% 50% 29% 17% Table 1: Accuracy comparison between Cm and Cs for Bhv and Vis segmentation of each paraphrase and the position of the IF as features, and setting their group as the output class using a libSVM wrapper (Chang and Lin, 2011). When the system misinterprets an instruction we use a similar approach to what people do in order to overcome misunderstandings. If the system executes an incorrect reaction, the IG can tell the system to cancel its current interpretation and try again using a paraphrase, selecting a different reaction. 4 Evaluation For the evaluation phase, we annotated both the Cm and Cs corpora entirely, and then we split them in an 80/20 proportion; the first 80% of data collected in each virtual world was used for training, while the remaining 20% was used for testing. For each pair (uk, ck) in the testing set, we used our algorithm to predict the reaction to the selected utterance, and then compared this result against the automatically annotated reaction. Table 1 shows the results. Comparing the Bhv and Vis segmentation strategies, Vis tends to obtain better results than Bhv. In addition, accuracy on the Cs corpus was generally higher than Cm. Given that Cs contained only one IG, we believe this led to less variability in the instructions and less noise in the training data. We evaluated the impact of user corrections by simulating them using the existing corpus. In case of a wrong response, the algorithm receives a second utterance with the same reaction (a paraphrase of the previous one). Then the new utterance is tested over the same set of possible groups, except for the one which was returned before. If the correct reaction is not predicted after four tries, or there are no utterances with the same reaction, the predictions are registered as wrong. To measure the effects of user corrections vs. without, we used a different evalu184 ation process for this algorithm: first, we split the corpus in a 50/50 proportion, and then we moved correctly predicted utterances from the testing set towards training, until either there was nothing more to learn or the training set reached 80% of the entire corpus size. As expected, user corrections significantly improve accuracy, as shown in Figure 3. The worst algorithm’s results improve linearly with each try, while the best ones behave asymptotically, barely improving after the second try. The best algorithm reaches 92% with just one correction from the IG. 5 Discussion and future work We presented an approach to instruction interpretation which learns from non-annotated logs of human behavior. Our empirical analysis shows that our best algorithm achieves 70% accuracy on this task, with no manual annotation required. When corrections are added, accuracy goes up to 92% for just one correction. We consider our results promising since state of the art semi-unsupervised approaches to instruction interpretation (Chen and Mooney, 2011) reports a 55% accuracy on manually segmented data. We plan to compare our system’s performance against human performance in comparable situations. Our informal observations of the GIVE corpus indicate that humans often follow instructions incorrectly, so our automated system’s performance may be on par with human performance. Although we have presented our approach in the context of 3D virtual worlds, we believe our technique is also applicable to other domains such as the web, video games, or Human Robot Interaction. Figure 3: Accuracy values with corrections over Cs References Luciana Benotti and Alexandre Denis. 2011. CL system: Giving instructions by corpus based selection. In Proceedings of the Generation Challenges Session at the 13th European Workshop on Natural Language Generation, pages 296–301, Nancy, France, September. Association for Computational Linguistics. Blai Bonet and H ´ector Geffner. 2005. mGPT: a probabilistic planner based on heuristic search. Journal of Artificial Intelligence Research, 24:933–944. S.R.K. Branavan, Harr Chen, Luke Zettlemoyer, and Regina Barzilay. 2009. Reinforcement learning for mapping instructions to actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 82–90, Suntec, Singapore, August. Association for Computational Linguistics. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27: 1– 27:27. Software available at http : / /www . cs ie . ntu .edu .tw/ ˜ c j l in/ l ibsvm. David L. Chen and Raymond J. Mooney. 2011. Learning to interpret natural language navigation instructions from observations. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI2011), pages 859–865, August. Corinna Cortes and Vladimir Vapnik. 1995. Supportvector networks. Machine Learning, 20:273–297. Myroslava O. Dzikovska, James F. Allen, and Mary D. Swift. 2008. Linking semantic and knowledge representations in a multi-domain dialogue system. Journal of Logic and Computation, 18:405–430, June. Andrew Gargett, Konstantina Garoufi, Alexander Koller, and Kristina Striegnitz. 2010. The GIVE-2 corpus of giving instructions in virtual environments. In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC), Malta. James J. Gibson. 1979. The Ecological Approach to Visual Perception, volume 40. Houghton Mifflin. Peter Gorniak and Deb Roy. 2007. Situated language understanding as filtering perceived affordances. Cognitive Science, 3 1(2): 197–231. J o¨rg Hoffmann. 2003. The Metric-FF planning system: Translating ”ignoring delete lists” to numeric state variables. Journal of Artificial Intelligence Research (JAIR), 20:291–341. Alexander Koller, Kristina Striegnitz, Andrew Gargett, Donna Byron, Justine Cassell, Robert Dale, Johanna Moore, and Jon Oberlander. 2010. Report on the second challenge on generating instructions in virtual environments (GIVE-2). In Proceedings of the 6th In185 ternational Natural Language Generation Conference (INLG), Dublin. Tessa Lau, Clemens Drews, and Jeffrey Nichols. 2009. Interpreting written how-to instructions. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, pages 1433–1438, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Technical Report 8. Matt MacMahon, Brian Stankiewicz, and Benjamin Kuipers. 2006. Walk the talk: connecting language, knowledge, and action in route instructions. In Proceedings of the 21st National Conference on Artifi- cial Intelligence - Volume 2, pages 1475–1482. AAAI Press. Cynthia Matuszek, Dieter Fox, and Karl Koscher. 2010. Following directions using statistical machine translation. In Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction, HRI ’ 10, pages 251–258, New York, NY, USA. ACM. Dana Nau, Malik Ghallab, and Paolo Traverso. 2004. Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., California, USA. Masoud Nikravesh, Tomohiro Takagi, Masanori Tajima, Akiyoshi Shinmura, Ryosuke Ohgaya, Koji Taniguchi, Kazuyosi Kawahara, Kouta Fukano, and Akiko Aizawa. 2005. Soft computing for perception-based decision processing and analysis: Web-based BISCDSS. In Masoud Nikravesh, Lotfi Zadeh, and Janusz Kacprzyk, editors, Soft Computing for Information Processing and Analysis, volume 164 of Studies in Fuzziness and Soft Computing, chapter 4, pages 93– 188. Springer Berlin / Heidelberg. Jeff Orkin and Deb Roy. 2009. Automatic learning and generation of social behavior from collective human gameplay. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent SystemsVolume 1, volume 1, pages 385–392. International Foundation for Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 3 11–3 18, Stroudsburg, PA, USA. Association for Computational Linguistics. Verena Rieser and Oliver Lemon. 2010. Learning human multimodal dialogue strategies. Natural Language Engineering, 16:3–23. Laura Stoia, Donna K. Byron, Darla Magdalene Shockley, and Eric Fosler-Lussier. 2006. Sentence planning for realtime navigational instructions. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, NAACLShort ’06, pages 157–160, Stroudsburg, PA, USA. Association for Computational Linguistics. Adam Vogel and Dan Jurafsky. 2010. Learning to follow navigational directions. In Proceedings ofthe 48th Annual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 806–814, Stroudsburg, PA, USA. Association for Computational Linguistics. 186

5 0.22900623 69 acl-2012-Deep Learning for NLP (without Magic)

Author: Richard Socher ; Yoshua Bengio ; Christopher D. Manning

Abstract: unkown-abstract

6 0.21580763 198 acl-2012-Topic Models, Latent Space Models, Sparse Coding, and All That: A Systematic Understanding of Probabilistic Semantic Extraction in Large Corpus

7 0.21031731 187 acl-2012-Subgroup Detection in Ideological Discussions

8 0.21024093 104 acl-2012-Graph-based Semi-Supervised Learning Algorithms for NLP

9 0.20717557 7 acl-2012-A Computational Approach to the Automation of Creative Naming

10 0.2059823 93 acl-2012-Fast Online Lexicon Learning for Grounded Language Acquisition

11 0.20191334 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs

12 0.20099984 79 acl-2012-Efficient Tree-Based Topic Modeling

13 0.20063004 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

14 0.20044002 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors

15 0.19946463 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text

16 0.19933954 195 acl-2012-The Creation of a Corpus of English Metalanguage

17 0.1986613 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool

18 0.19850956 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

19 0.19776604 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

20 0.19720753 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System