acl acl2012 acl2012-104 knowledge-graph by maker-knowledge-mining

104 acl-2012-Graph-based Semi-Supervised Learning Algorithms for NLP

Source: pdf

Author: Amar Subramanya ; Partha Pratim Talukdar

Abstract: While labeled data is expensive to prepare, ever increasing amounts of unlabeled linguistic data are becoming widely available. In order to adapt to this phenomenon, several semi-supervised learning (SSL) algorithms, which learn from labeled as well as unlabeled data, have been developed. In a separate line of work, researchers have started to realize that graphs provide a natural way to represent data in a variety of domains. Graph-based SSL algorithms, which bring together these two lines of work, have been shown to outperform the state-ofthe-art in many applications in speech processing, computer vision and NLP. In particular, recent NLP research has successfully used graph-based SSL algorithms for PoS tagging (Subramanya et al., 2010), semantic parsing (Das and Smith, 2011), knowledge acquisition (Talukdar et al., 2008), sentiment analysis (Goldberg and Zhu, 2006) and text categoriza- tion (Subramanya and Bilmes, 2008). Recognizing this promising and emerging area of research, this tutorial focuses on graph-based SSL algorithms (e.g., label propagation methods). The tutorial is intended to be a sequel to the ACL 2008 SSL tutorial, focusing exclusively on graph-based SSL methods and recent advances in this area, which were beyond the scope of the previous tutorial. The tutorial is divided in two parts. In the first part, we will motivate the need for graph-based SSL methods, introduce some standard graph-based SSL algorithms, and discuss connections between these approaches. We will also discuss how linguistic data can be encoded as graphs and show how graph-based algorithms can be scaled to large amounts of data (e.g., web-scale data). Part 2 of the tutorial will focus on how graph-based methods can be used to solve several critical NLP tasks, including basic problems such as PoS tagging, semantic parsing, and more downstream tasks such as text categorization, information acquisition, and 6 Partha Pratim Talukdar Carnegie Mellon University ppt @ cs . cmu . edu sentiment analysis. We will conclude the tutorial with some exciting avenues for future work. Familiarity with semi-supervised learning and graph-based methods will not be assumed, and the necessary background will be provided. Examples from NLP tasks will be used throughout the tutorial to convey the necessary concepts. At the end of this tutorial, the attendee will walk away with the following: • An in-depth knowledge of the current state-oftAhen- ainrt-d dienp graph-based SeS oLf algorithms, taantde- tohfeability to implement them. • The ability to decide on the suitability of graph-based S toSL d meceitdheod osn nfo trh a problem. • Familiarity with different NLP tasks where graph-based wSSitLh m dieftfehorednst h NaLveP Pb teaesnk successfully applied. In addition to the above goals, we hope that this tutorial will better prepare the attendee to conduct exciting research at the intersection of NLP and other emerging areas with natural graph-structured data (e.g., Computation Social Science). Please visit http://graph-ssl.wikidot.com/ for details. References Dipanjan Das and Noah A. Smith. 2011. Semi-supervised frame-semantic parsing for unknown predicates. In Proceedings of the ACL: Human Language Technologies. Andrew B. Goldberg and Xiaojin Zhu. 2006. Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In Proceedings ofthe Workshop on Graph Based Methods for NLP. Amarnag Subramanya and Jeff Bilmes. 2008. Soft-supervised text classification. In EMNLP. Amarnag Subramanya, Slav Petrov, and Fernando Pereira. 2010. Graph-based semi-supervised learning of structured tagging models. In EMNLP. Partha Pratim Talukdar, Joseph Reisinger, Marius Pasca, Deepak Ravichandran, Rahul Bhagat, and Fernando Pereira. 2008. Weakly supervised acquisition of labeled class instances using graph random walks. In EMNLP. Jeju, Republic of Korea,T 8ut Jourliya 2l0 A1b2s.tr ?ac c2t0s1 o2f A ACssLo 2c0ia1t2io,n p faogre C 6o,mputational Linguistics

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Graph-based Semi-Supervised Learning Algorithms for NLP Amar Subramanya Google Research asubram@ google . [sent-1, score-0.045]

2 com Abstract While labeled data is expensive to prepare, ever increasing amounts of unlabeled linguistic data are becoming widely available. [sent-2, score-0.232]

3 In order to adapt to this phenomenon, several semi-supervised learning (SSL) algorithms, which learn from labeled as well as unlabeled data, have been developed. [sent-3, score-0.12]

4 In a separate line of work, researchers have started to realize that graphs provide a natural way to represent data in a variety of domains. [sent-4, score-0.154]

5 Graph-based SSL algorithms, which bring together these two lines of work, have been shown to outperform the state-ofthe-art in many applications in speech processing, computer vision and NLP. [sent-5, score-0.092]

6 In particular, recent NLP research has successfully used graph-based SSL algorithms for PoS tagging (Subramanya et al. [sent-6, score-0.203]

7 , 2010), semantic parsing (Das and Smith, 2011), knowledge acquisition (Talukdar et al. [sent-7, score-0.11]

8 , 2008), sentiment analysis (Goldberg and Zhu, 2006) and text categoriza- tion (Subramanya and Bilmes, 2008). [sent-8, score-0.084]

9 Recognizing this promising and emerging area of research, this tutorial focuses on graph-based SSL algorithms (e. [sent-9, score-0.63]

10 The tutorial is intended to be a sequel to the ACL 2008 SSL tutorial, focusing exclusively on graph-based SSL methods and recent advances in this area, which were beyond the scope of the previous tutorial. [sent-12, score-0.492]

11 In the first part, we will motivate the need for graph-based SSL methods, introduce some standard graph-based SSL algorithms, and discuss connections between these approaches. [sent-14, score-0.125]

12 We will also discuss how linguistic data can be encoded as graphs and show how graph-based algorithms can be scaled to large amounts of data (e. [sent-15, score-0.337]

13 We will conclude the tutorial with some exciting avenues for future work. [sent-21, score-0.571]

14 Familiarity with semi-supervised learning and graph-based methods will not be assumed, and the necessary background will be provided. [sent-22, score-0.039]

15 Examples from NLP tasks will be used throughout the tutorial to convey the necessary concepts. [sent-23, score-0.51]

16 At the end of this tutorial, the attendee will walk away with the following: • An in-depth knowledge of the current state-oftAhen- ainrt-d dienp graph-based SeS oLf algorithms, taantde- tohfeability to implement them. [sent-24, score-0.24]

17 • The ability to decide on the suitability of graph-based S toSL d meceitdheod osn nfo trh a problem. [sent-25, score-0.281]

18 • Familiarity with different NLP tasks where graph-based wSSitLh m dieftfehorednst h NaLveP Pb teaesnk successfully applied. [sent-26, score-0.094]

19 In addition to the above goals, we hope that this tutorial will better prepare the attendee to conduct exciting research at the intersection of NLP and other emerging areas with natural graph-structured data (e. [sent-27, score-0.962]

20 Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. [sent-41, score-0.2]

21 Weakly supervised acquisition of labeled class instances using graph random walks. [sent-53, score-0.159]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ssl', 0.627), ('tutorial', 0.351), ('subramanya', 0.275), ('talukdar', 0.175), ('attendee', 0.157), ('exciting', 0.137), ('familiarity', 0.125), ('amarnag', 0.116), ('stars', 0.116), ('partha', 0.116), ('pratim', 0.116), ('goldberg', 0.105), ('emerging', 0.1), ('prepare', 0.1), ('algorithms', 0.096), ('nlp', 0.087), ('sentiment', 0.084), ('acquisition', 0.075), ('das', 0.074), ('osn', 0.068), ('suitability', 0.068), ('fernando', 0.067), ('graphs', 0.063), ('nfo', 0.062), ('pb', 0.062), ('olf', 0.062), ('bilmes', 0.062), ('xiaojin', 0.062), ('marius', 0.062), ('reisinger', 0.058), ('pasca', 0.055), ('amounts', 0.054), ('successfully', 0.054), ('tagging', 0.053), ('please', 0.052), ('aren', 0.052), ('bhagat', 0.052), ('avenues', 0.052), ('acsslo', 0.052), ('faogre', 0.052), ('jourliya', 0.052), ('area', 0.051), ('unlabeled', 0.05), ('realize', 0.05), ('scaled', 0.05), ('vision', 0.05), ('trh', 0.05), ('deepak', 0.05), ('jeju', 0.05), ('visit', 0.048), ('convey', 0.048), ('rahul', 0.046), ('jeff', 0.046), ('becoming', 0.046), ('walk', 0.046), ('dipanjan', 0.045), ('exclusively', 0.045), ('downstream', 0.045), ('seeing', 0.045), ('ravichandran', 0.045), ('google', 0.045), ('motivate', 0.043), ('graph', 0.043), ('weakly', 0.042), ('carnegie', 0.042), ('mellon', 0.042), ('bring', 0.042), ('discuss', 0.042), ('labeled', 0.041), ('started', 0.041), ('ever', 0.041), ('cmu', 0.04), ('connections', 0.04), ('intersection', 0.04), ('tasks', 0.04), ('hope', 0.039), ('necessary', 0.039), ('areas', 0.038), ('away', 0.037), ('parsing', 0.035), ('goals', 0.034), ('joseph', 0.033), ('decide', 0.033), ('propagation', 0.033), ('zhu', 0.033), ('intended', 0.033), ('republic', 0.033), ('noah', 0.033), ('phenomenon', 0.033), ('encoded', 0.032), ('focuses', 0.032), ('throughout', 0.032), ('focusing', 0.032), ('assumed', 0.032), ('categorization', 0.032), ('scope', 0.031), ('conclude', 0.031), ('slav', 0.03), ('critical', 0.029), ('adapt', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 104 acl-2012-Graph-based Semi-Supervised Learning Algorithms for NLP

Author: Amar Subramanya ; Partha Pratim Talukdar

2 0.19326834 69 acl-2012-Deep Learning for NLP (without Magic)

Author: Richard Socher ; Yoshua Bengio ; Christopher D. Manning

Abstract: unkown-abstract

3 0.10456295 198 acl-2012-Topic Models, Latent Space Models, Sparse Coding, and All That: A Systematic Understanding of Probabilistic Semantic Extraction in Large Corpus

Author: Eric Xing

Abstract: Probabilistic topic models have recently gained much popularity in informational retrieval and related areas. Via such models, one can project high-dimensional objects such as text documents into a low dimensional space where their latent semantics are captured and modeled; can integrate multiple sources of information—to ”share statistical strength” among components of a hierarchical probabilistic model; and can structurally display and classify the otherwise unstructured object collections. However, to many practitioners, how topic models work, what to and not to expect from a topic model, how is it different from and related to classical matrix algebraic techniques such as LSI, NMF in NLP, how to empower topic models to deal with complex scenarios such as multimodal data, contractual text in social media, evolving corpus, or presence of supervision such as labeling and rating, how to make topic modeling computationally tractable even on webscale data, etc., in a principled way, remain unclear. In this tutorial, I will demystify the conceptual, mathematical, and computational issues behind all such problems surrounding the topic models and their applications by presenting a systematic overview of the mathematical foundation of topic modeling, and its connections to a number of related methods popular in other fields such as the LDA, admixture model, mixed membership model, latent space models, and sparse coding. Iwill offer a simple and unifying view of all these techniques under the framework multi-view latent space embedding, and online the roadmap of model extension and algorithmic design to3 ward different applications in IR and NLP. A main theme of this tutorial that tie together a wide range of issues and problems will build on the ”probabilistic graphical model” formalism, a formalism that exploits the conjoined talents of graph theory and probability theory to build complex models out of simpler pieces. Iwill use this formalism as a main aid to discuss both the mathematical underpinnings for the models and the related computational issues in a unified, simplistic, transparent, and actionable fashion. Jeju, Republic of Korea,T 8ut Jourliya 2l0 A1b2s.tr ?ac c2t0s1 o2f A ACssLo 2c0ia1t2io,n p faogre C 3o,mputational Linguistics

4 0.091970041 183 acl-2012-State-of-the-Art Kernels for Natural Language Processing

Author: Alessandro Moschitti

Abstract: unkown-abstract

5 0.086921774 42 acl-2012-Bootstrapping via Graph Propagation

Author: Max Whitney ; Anoop Sarkar

Abstract: Bootstrapping a classifier from a small set of seed rules can be viewed as the propagation of labels between examples via features shared between them. This paper introduces a novel variant of the Yarowsky algorithm based on this view. It is a bootstrapping learning method which uses a graph propagation algorithm with a well defined objective function. The experimental results show that our proposed bootstrapping algorithm achieves state of the art performance or better on several different natural language data sets.

6 0.073309027 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis

7 0.066307202 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

8 0.060500458 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification

9 0.058912165 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors

10 0.046486299 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models

11 0.045523006 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

12 0.044461899 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction

13 0.044273909 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

14 0.038531538 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

15 0.033572983 115 acl-2012-Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification

16 0.032272436 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

17 0.032079052 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

18 0.031903856 122 acl-2012-Joint Evaluation of Morphological Segmentation and Syntactic Parsing

19 0.031747922 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System

20 0.031105066 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.086), (1, 0.072), (2, 0.02), (3, -0.075), (4, 0.004), (5, 0.019), (6, 0.034), (7, 0.031), (8, -0.101), (9, -0.054), (10, 0.003), (11, 0.029), (12, 0.03), (13, -0.04), (14, 0.026), (15, 0.07), (16, -0.056), (17, 0.041), (18, -0.009), (19, 0.035), (20, 0.023), (21, 0.026), (22, -0.107), (23, 0.018), (24, 0.077), (25, 0.24), (26, 0.151), (27, -0.103), (28, -0.016), (29, 0.111), (30, 0.026), (31, -0.165), (32, 0.179), (33, -0.269), (34, -0.041), (35, 0.135), (36, 0.105), (37, 0.13), (38, -0.061), (39, -0.009), (40, -0.027), (41, 0.099), (42, 0.132), (43, 0.142), (44, 0.015), (45, 0.006), (46, -0.073), (47, 0.022), (48, -0.016), (49, -0.101)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97546697 104 acl-2012-Graph-based Semi-Supervised Learning Algorithms for NLP

Author: Amar Subramanya ; Partha Pratim Talukdar

2 0.86143517 69 acl-2012-Deep Learning for NLP (without Magic)

Author: Richard Socher ; Yoshua Bengio ; Christopher D. Manning

Abstract: unkown-abstract

3 0.51064926 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

Author: Inderjeet Mani ; James Pustejovsky

Abstract: unkown-abstract

4 0.474767 198 acl-2012-Topic Models, Latent Space Models, Sparse Coding, and All That: A Systematic Understanding of Probabilistic Semantic Extraction in Large Corpus

Author: Eric Xing

5 0.39086655 42 acl-2012-Bootstrapping via Graph Propagation

Author: Max Whitney ; Anoop Sarkar

6 0.36946389 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis

7 0.32432753 183 acl-2012-State-of-the-Art Kernels for Natural Language Processing

8 0.23965268 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

9 0.21757433 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

10 0.2087483 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

11 0.19171932 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors

12 0.19099604 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench

13 0.18081822 136 acl-2012-Learning to Translate with Multiple Objectives

14 0.17239764 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

15 0.15870607 112 acl-2012-Humor as Circuits in Semantic Networks

16 0.15679124 121 acl-2012-Iterative Viterbi A* Algorithm for K-Best Sequential Decoding

17 0.15597081 180 acl-2012-Social Event Radar: A Bilingual Context Mining and Sentiment Analysis Summarization System

18 0.14791675 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification

19 0.1462782 153 acl-2012-Named Entity Disambiguation in Streaming Data

20 0.14003325 83 acl-2012-Error Mining on Dependency Trees

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.012), (26, 0.028), (28, 0.04), (30, 0.035), (37, 0.015), (39, 0.044), (54, 0.32), (59, 0.097), (74, 0.028), (82, 0.022), (84, 0.029), (85, 0.029), (90, 0.081), (92, 0.049), (94, 0.021), (99, 0.053)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76940733 104 acl-2012-Graph-based Semi-Supervised Learning Algorithms for NLP

Author: Amar Subramanya ; Partha Pratim Talukdar

2 0.45525458 59 acl-2012-Corpus-based Interpretation of Instructions in Virtual Environments

Author: Luciana Benotti ; Martin Villalba ; Tessa Lau ; Julian Cerruti

Abstract: Previous approaches to instruction interpretation have required either extensive domain adaptation or manually annotated corpora. This paper presents a novel approach to instruction interpretation that leverages a large amount of unannotated, easy-to-collect data from humans interacting with a virtual world. We compare several algorithms for automatically segmenting and discretizing this data into (utterance, reaction) pairs and training a classifier to predict reactions given the next utterance. Our empirical analysis shows that the best algorithm achieves 70% accuracy on this task, with no manual annotation required. 1 Introduction and motivation Mapping instructions into automatically executable actions would enable the creation of natural lan- , guage interfaces to many applications (Lau et al., 2009; Branavan et al., 2009; Orkin and Roy, 2009). In this paper, we focus on the task of navigation and manipulation of a virtual environment (Vogel and Jurafsky, 2010; Chen and Mooney, 2011). Current symbolic approaches to the problem are brittle to the natural language variation present in instructions and require intensive rule authoring to be fit for a new task (Dzikovska et al., 2008). Current statistical approaches require extensive manual annotations of the corpora used for training (MacMahon et al., 2006; Matuszek et al., 2010; Gorniak and Roy, 2007; Rieser and Lemon, 2010). Manual annotation and rule authoring by natural language engineering experts are bottlenecks for developing conversational systems for new domains. 181 t e s s al au @ us . ibm . com, j ce rrut i ar .ibm . com @ This paper proposes a fully automated approach to interpreting natural language instructions to complete a task in a virtual world based on unsupervised recordings of human-human interactions perform- ing that task in that virtual world. Given unannotated corpora collected from humans following other humans’ instructions, our system automatically segments the corpus into labeled training data for a classification algorithm. Our interpretation algorithm is based on the observation that similar instructions uttered in similar contexts should lead to similar actions being taken in the virtual world. Given a previously unseen instruction, our system outputs actions that can be directly executed in the virtual world, based on what humans did when given similar instructions in the past. 2 Corpora situated in virtual worlds Our environment consists of six virtual worlds designed for the natural language generation shared task known as the GIVE Challenge (Koller et al., 2010), where a pair of partners must collaborate to solve a task in a 3D space (Figure 1). The “instruction follower” (IF) can move around in the virtual world, but has no knowledge of the task. The “instruction giver” (IG) types instructions to the IF in order to guide him to accomplish the task. Each corpus contains the IF’s actions and position recorded every 200 milliseconds, as well as the IG’s instruc- tions with their timestamps. We used two corpora for our experiments. The Cm corpus (Gargett et al., 2010) contains instructions given by multiple people, consisting of 37 games spanning 2163 instructions over 8: 17 hs. The Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi1c 8s1–186, Figure 1: A screenshot of a virtual world. The world consists of interconnecting hallways, rooms and objects Cs corpus (Benotti and Denis, 2011), gathered using a single IG, is composed of 63 games and 3417 in- structions, and was recorded in a span of 6:09 hs. It took less than 15 hours to collect the corpora through the web and the subjects reported that the experiment was fun. While the environment is restricted, people describe the same route and the same objects in extremely different ways. Below are some examples of instructions from our corpus all given for the same route shown in Figure 1. 1) out 2) walk down the passage 3) nowgo [sic] to the pink room 4) back to the room with the plant 5) Go through the door on the left 6) go through opening with yellow wall paper People describe routes using landmarks (4) or specific actions (2). They may describe the same object differently (5 vs 6). Instructions also differ in their scope (3 vs 1). Thus, even ignoring spelling and grammatical errors, navigation instructions contain considerable variation which makes interpreting them a challenging problem. 3 Learning from previous interpretations Our algorithm consists of two phases: annotation and interpretation. Annotation is performed only once and consists of automatically associating each IG instruction to an IF reaction. Interpretation is performed every time the system receives an instruc182 tion and consists of predicting an appropriate reaction given reactions observed in the corpus. Our method is based on the assumption that a reaction captures the semantics of the instruction that caused it. Therefore, if two utterances result in the same reaction, they are paraphrases of each other, and similar utterances should generate the same reaction. This approach enables us to predict reactions for previously-unseen instructions. 3.1 Annotation phase The key challenge in learning from massive amounts of easily-collected data is to automatically annotate an unannotated corpus. Our annotation method consists of two parts: first, segmenting a low-level interaction trace into utterances and corresponding reactions, and second, discretizing those reactions into canonical action sequences. Segmentation enables our algorithm to learn from traces of IFs interacting directly with a virtual world. Since the IF can move freely in the virtual world, his actions are a stream of continuous behavior. Segmentation divides these traces into reactions that follow from each utterance of the IG. Consider the following example starting at the situation shown in Figure 1: IG(1): go through the yellow opening IF(2): [walks out of the room] IF(3): [turns left at the intersection] IF(4): [enters the room with the sofa] IG(5): stop It is not clear whether the IF is doing h3, 4i because h neo tis c reacting htoe r1 t or Fbec isadu soei hge h 3is, being proactive. While one could manually annotate this data to remove extraneous actions, our goal is to develop automated solutions that enable learning from massive amounts of data. We decided to approach this problem by experimenting with two alternative formal definitions: 1) a strict definition that considers the maximum reaction according to the IF behavior, and 2) a loose defini- tion based on the empirical observation that, in situated interaction, most instructions are constrained by the current visually perceived affordances (Gibson, 1979; Stoia et al., 2006). We formally define behavior segmentation (Bhv) as follows. A reaction rk to an instruction uk begins right after the instruction uk is uttered and ends right before the next instruction uk+1 is uttered. In the example, instruction 1corresponds to h2, 3, 4i . We formally d inefsitnrue visibility segmentation (Vis) as f Wolelows. A reaction rk to an instruction uk begins right after the instruction uk is uttered and ends right before the next instruction uk+1 is uttered or right after the IF leaves the area visible at 360◦ from where uk was uttered. In the example, instruction 1’s reaction would be limited to h2i because the intersection is nwootu vldisi bbele l ifmroimte dw htoer he2 tihe b eicnasutrsuec ttihoen was suetctetiroend. The Bhv and Vis methods define how to segment an interaction trace into utterances and their corresponding reactions. However, users frequently perform noisy behavior that is irrelevant to the goal of the task. For example, after hearing an instruction, an IF might go into the wrong room, realize the error, and leave the room. A reaction should not in- clude such irrelevant actions. In addition, IFs may accomplish the same goal using different behaviors: two different IFs may interpret “go to the pink room” by following different paths to the same destination. We would like to be able to generalize both reactions into one canonical reaction. As a result, our approach discretizes reactions into higher-level action sequences with less noise and less variation. Our discretization algorithm uses an automated planner and a planning representation of the task. This planning representation includes: (1) the task goal, (2) the actions which can be taken in the virtual world, and (3) the current state of the virtual world. Using the planning representation, the planner calculates an optimal path between the starting and ending states of the reaction, eliminating all unnecessary actions. While we use the classical planner FF (Hoffmann, 2003), our technique could also work with classical planning (Nau et al., 2004) or other techniques such as probabilistic planning (Bonet and Geffner, 2005). It is also not dependent on a particular discretization of the world in terms of actions. Now we are ready to define canonical reaction ck formally. Let Sk be the state of the virtual world when instruction uk was uttered, Sk+1 be the state of the world where the reaction ends (as defined by Bhv or Vis segmentation), and D be the planning domain representation of the virtual world. The canonical reaction to uk is defined as the sequence of actions 183 returned by the planner with Sk as initial state, Sk+1 as goal state and D as planning domain. 3.2 Interpretation phase The annotation phase results in a collection of (uk, ck) pairs. The interpretation phase uses these pairs to interpret new utterances in three steps. First, we filter the set of pairs into those whose reactions can be directly executed from the current IF position. Second, we group the filtered pairs according to their reactions. Third, we select the group with utterances most similar to the new utterance, and output that group’s reaction. Figure 2 shows the output of the first two steps: three groups of pairs whose reactions can all be executed from the IF’s current position. Figure 2: Utterance groups for this situation. Colored arrows show the reaction associated with each group. We treat the third step, selecting the most similar group for a new utterance, as a classification problem. We compare three different classification methods. One method uses nearest-neighbor classification with three different similarity metrics: Jaccard and Overlap coefficients (both of which measure the degree of overlap between two sets, differing only in the normalization of the final value (Nikravesh et al., 2005)), and Levenshtein Distance (a string met- ric for measuring the amount of differences between two sequences of words (Levenshtein, 1966)). Our second classification method employs a strategy in which we considered each group as a set of possible machine translations of our utterance, using the BLEU measure (Papineni et al., 2002) to select which group could be considered the best translation of our utterance. Finally, we trained an SVM classifier (Cortes and Vapnik, 1995) using the unigrams Corpus Cm Corpus Cs Algorithm Bhv Vis Bhv Vis Jaccard47%54%54%70% Overlap BLEU SVM Levenshtein 43% 44% 33% 21% 53% 52% 29% 20% 45% 54% 45% 8% 60% 50% 29% 17% Table 1: Accuracy comparison between Cm and Cs for Bhv and Vis segmentation of each paraphrase and the position of the IF as features, and setting their group as the output class using a libSVM wrapper (Chang and Lin, 2011). When the system misinterprets an instruction we use a similar approach to what people do in order to overcome misunderstandings. If the system executes an incorrect reaction, the IG can tell the system to cancel its current interpretation and try again using a paraphrase, selecting a different reaction. 4 Evaluation For the evaluation phase, we annotated both the Cm and Cs corpora entirely, and then we split them in an 80/20 proportion; the first 80% of data collected in each virtual world was used for training, while the remaining 20% was used for testing. For each pair (uk, ck) in the testing set, we used our algorithm to predict the reaction to the selected utterance, and then compared this result against the automatically annotated reaction. Table 1 shows the results. Comparing the Bhv and Vis segmentation strategies, Vis tends to obtain better results than Bhv. In addition, accuracy on the Cs corpus was generally higher than Cm. Given that Cs contained only one IG, we believe this led to less variability in the instructions and less noise in the training data. We evaluated the impact of user corrections by simulating them using the existing corpus. In case of a wrong response, the algorithm receives a second utterance with the same reaction (a paraphrase of the previous one). Then the new utterance is tested over the same set of possible groups, except for the one which was returned before. If the correct reaction is not predicted after four tries, or there are no utterances with the same reaction, the predictions are registered as wrong. To measure the effects of user corrections vs. without, we used a different evalu184 ation process for this algorithm: first, we split the corpus in a 50/50 proportion, and then we moved correctly predicted utterances from the testing set towards training, until either there was nothing more to learn or the training set reached 80% of the entire corpus size. As expected, user corrections significantly improve accuracy, as shown in Figure 3. The worst algorithm’s results improve linearly with each try, while the best ones behave asymptotically, barely improving after the second try. The best algorithm reaches 92% with just one correction from the IG. 5 Discussion and future work We presented an approach to instruction interpretation which learns from non-annotated logs of human behavior. Our empirical analysis shows that our best algorithm achieves 70% accuracy on this task, with no manual annotation required. When corrections are added, accuracy goes up to 92% for just one correction. We consider our results promising since state of the art semi-unsupervised approaches to instruction interpretation (Chen and Mooney, 2011) reports a 55% accuracy on manually segmented data. We plan to compare our system’s performance against human performance in comparable situations. Our informal observations of the GIVE corpus indicate that humans often follow instructions incorrectly, so our automated system’s performance may be on par with human performance. Although we have presented our approach in the context of 3D virtual worlds, we believe our technique is also applicable to other domains such as the web, video games, or Human Robot Interaction. Figure 3: Accuracy values with corrections over Cs References Luciana Benotti and Alexandre Denis. 2011. CL system: Giving instructions by corpus based selection. In Proceedings of the Generation Challenges Session at the 13th European Workshop on Natural Language Generation, pages 296–301, Nancy, France, September. Association for Computational Linguistics. Blai Bonet and H ´ector Geffner. 2005. mGPT: a probabilistic planner based on heuristic search. Journal of Artificial Intelligence Research, 24:933–944. S.R.K. Branavan, Harr Chen, Luke Zettlemoyer, and Regina Barzilay. 2009. Reinforcement learning for mapping instructions to actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 82–90, Suntec, Singapore, August. Association for Computational Linguistics. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27: 1– 27:27. Software available at http : / /www . cs ie . ntu .edu .tw/ ˜ c j l in/ l ibsvm. David L. Chen and Raymond J. Mooney. 2011. Learning to interpret natural language navigation instructions from observations. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI2011), pages 859–865, August. Corinna Cortes and Vladimir Vapnik. 1995. Supportvector networks. Machine Learning, 20:273–297. Myroslava O. Dzikovska, James F. Allen, and Mary D. Swift. 2008. Linking semantic and knowledge representations in a multi-domain dialogue system. Journal of Logic and Computation, 18:405–430, June. Andrew Gargett, Konstantina Garoufi, Alexander Koller, and Kristina Striegnitz. 2010. The GIVE-2 corpus of giving instructions in virtual environments. In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC), Malta. James J. Gibson. 1979. The Ecological Approach to Visual Perception, volume 40. Houghton Mifflin. Peter Gorniak and Deb Roy. 2007. Situated language understanding as filtering perceived affordances. Cognitive Science, 3 1(2): 197–231. J o¨rg Hoffmann. 2003. The Metric-FF planning system: Translating ”ignoring delete lists” to numeric state variables. Journal of Artificial Intelligence Research (JAIR), 20:291–341. Alexander Koller, Kristina Striegnitz, Andrew Gargett, Donna Byron, Justine Cassell, Robert Dale, Johanna Moore, and Jon Oberlander. 2010. Report on the second challenge on generating instructions in virtual environments (GIVE-2). In Proceedings of the 6th In185 ternational Natural Language Generation Conference (INLG), Dublin. Tessa Lau, Clemens Drews, and Jeffrey Nichols. 2009. Interpreting written how-to instructions. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, pages 1433–1438, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Technical Report 8. Matt MacMahon, Brian Stankiewicz, and Benjamin Kuipers. 2006. Walk the talk: connecting language, knowledge, and action in route instructions. In Proceedings of the 21st National Conference on Artifi- cial Intelligence - Volume 2, pages 1475–1482. AAAI Press. Cynthia Matuszek, Dieter Fox, and Karl Koscher. 2010. Following directions using statistical machine translation. In Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction, HRI ’ 10, pages 251–258, New York, NY, USA. ACM. Dana Nau, Malik Ghallab, and Paolo Traverso. 2004. Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., California, USA. Masoud Nikravesh, Tomohiro Takagi, Masanori Tajima, Akiyoshi Shinmura, Ryosuke Ohgaya, Koji Taniguchi, Kazuyosi Kawahara, Kouta Fukano, and Akiko Aizawa. 2005. Soft computing for perception-based decision processing and analysis: Web-based BISCDSS. In Masoud Nikravesh, Lotfi Zadeh, and Janusz Kacprzyk, editors, Soft Computing for Information Processing and Analysis, volume 164 of Studies in Fuzziness and Soft Computing, chapter 4, pages 93– 188. Springer Berlin / Heidelberg. Jeff Orkin and Deb Roy. 2009. Automatic learning and generation of social behavior from collective human gameplay. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent SystemsVolume 1, volume 1, pages 385–392. International Foundation for Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 3 11–3 18, Stroudsburg, PA, USA. Association for Computational Linguistics. Verena Rieser and Oliver Lemon. 2010. Learning human multimodal dialogue strategies. Natural Language Engineering, 16:3–23. Laura Stoia, Donna K. Byron, Darla Magdalene Shockley, and Eric Fosler-Lussier. 2006. Sentence planning for realtime navigational instructions. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, NAACLShort ’06, pages 157–160, Stroudsburg, PA, USA. Association for Computational Linguistics. Adam Vogel and Dan Jurafsky. 2010. Learning to follow navigational directions. In Proceedings ofthe 48th Annual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 806–814, Stroudsburg, PA, USA. Association for Computational Linguistics. 186

3 0.4190897 69 acl-2012-Deep Learning for NLP (without Magic)

Author: Richard Socher ; Yoshua Bengio ; Christopher D. Manning

Abstract: unkown-abstract

4 0.40633431 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors

Author: Malte Nuhn ; Arne Mauser ; Hermann Ney

Abstract: In this paper we show how to train statistical machine translation systems on reallife tasks using only non-parallel monolingual data from two languages. We present a modification of the method shown in (Ravi and Knight, 2011) that is scalable to vocabulary sizes of several thousand words. On the task shown in (Ravi and Knight, 2011) we obtain better results with only 5% of the computational effort when running our method with an n-gram language model. The efficiency improvement of our method allows us to run experiments with vocabulary sizes of around 5,000 words, such as a non-parallel version of the VERBMOBIL corpus. We also report results using data from the monolingual French and English GIGAWORD corpora.

5 0.40353376 198 acl-2012-Topic Models, Latent Space Models, Sparse Coding, and All That: A Systematic Understanding of Probabilistic Semantic Extraction in Large Corpus

Author: Eric Xing

6 0.36649853 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale

7 0.36611372 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places

8 0.36183184 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content

9 0.36105418 77 acl-2012-Ecological Evaluation of Persuasive Messages Using Google AdWords

10 0.3607128 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation

11 0.36030135 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

12 0.35994363 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

13 0.35982427 191 acl-2012-Temporally Anchored Relation Extraction

14 0.35973978 167 acl-2012-QuickView: NLP-based Tweet Search

15 0.35957697 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

16 0.35956997 139 acl-2012-MIX Is Not a Tree-Adjoining Language

17 0.35943621 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

18 0.35927209 184 acl-2012-String Re-writing Kernel

19 0.35901985 136 acl-2012-Learning to Translate with Multiple Objectives

20 0.35880733 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models