acl acl2012 acl2012-69 knowledge-graph by maker-knowledge-mining

69 acl-2012-Deep Learning for NLP (without Magic)

Source: pdf

Author: Richard Socher ; Yoshua Bengio ; Christopher D. Manning

Abstract: unkown-abstract

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Computer Science Department, Stanford University ∗ DIRO, Universit e´ de Montr ´eal, Montr ´eal, QC, Canada 1 Abtract Machine learning is everywhere in today’s NLP, but by and large machine learning amounts to numerical optimization of weights for human designed representations and features. [sent-6, score-0.46]

2 The goal of deep learning is to explore how computers can take advantage of data to develop features and representations appropriate for complex interpretation tasks. [sent-7, score-0.809]

3 This tutorial aims to cover the basic motivation, ideas, models and learning algorithms in deep learning for natural language processing. [sent-8, score-0.897]

4 Recently, these methods have been shown to perform very well on various NLP tasks such as language modeling, POS tagging, named entity recognition, sentiment analysis and paraphrase detection, among others. [sent-9, score-0.314]

5 The most attractive quality of these techniques is that they can perform well without any external hand-designed resources or time-intensive feature engineering. [sent-10, score-0.164]

6 Despite these advantages, many researchers in NLP are not familiar with these methods. [sent-11, score-0.145]

7 Our focus is on insight and understanding, using graphical illustrations and simple, intuitive derivations. [sent-12, score-0.41]

8 The goal of the tutorial is to make the inner workings of these techniques transparent, intuitive and their results interpretable, rather than black boxes labeled ”magic here”. [sent-13, score-0.927]

9 The first part of the tutorial presents the basics of neural networks, neural word vectors, several simple models based on local windows and the math and algorithms of training via backpropagation. [sent-14, score-0.896]

10 In this section applications include language modeling and POS tagging. [sent-15, score-0.168]

11 In the second section we present recursive neural networks which can learn structured tree outputs as well as vector representations for phrases and sentences. [sent-16, score-0.647]

12 We show how training can be achieved by a 5 modified version of the backpropagation algorithm introduced before. [sent-18, score-0.045]

13 These modifications allow the algorithm to work on tree structures. [sent-19, score-0.139]

14 We also draw connections to recent work in seman- tic compositionality in vector spaces. [sent-21, score-0.261]

15 The principle goal, again, is to make these methods appear intuitive and interpretable rather than mathematically confusing. [sent-22, score-0.559]

16 By this point in the tutorial, the audience members should have a clear understanding of how to build a deep learning system for word-, sentenceand document-level tasks. [sent-23, score-0.596]

17 The last part of the tutorial gives a general overview of the different applications of deep learning in NLP, including bag of words models. [sent-24, score-0.851]

18 We will provide a discussion of NLP-oriented issues in modeling, interpretation, representational power, and optimization. [sent-25, score-0.164]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tutorial', 0.363), ('deep', 0.261), ('magic', 0.254), ('montr', 0.254), ('eal', 0.205), ('intuitive', 0.162), ('interpretable', 0.142), ('neural', 0.14), ('paraphrase', 0.138), ('nlp', 0.135), ('representations', 0.134), ('math', 0.127), ('representational', 0.116), ('illustrations', 0.116), ('networks', 0.115), ('anford', 0.108), ('audience', 0.108), ('socher', 0.108), ('qc', 0.108), ('yoshua', 0.108), ('interpretation', 0.108), ('sentiment', 0.104), ('bengio', 0.102), ('mathematically', 0.102), ('familiar', 0.102), ('cover', 0.098), ('today', 0.097), ('acsslo', 0.097), ('faogre', 0.097), ('jourliya', 0.097), ('transparent', 0.097), ('boxes', 0.093), ('jeju', 0.093), ('modifications', 0.086), ('universit', 0.086), ('manning', 0.085), ('equations', 0.084), ('inner', 0.084), ('windows', 0.081), ('bag', 0.081), ('attractive', 0.079), ('tic', 0.079), ('computers', 0.079), ('numerical', 0.079), ('recursive', 0.077), ('members', 0.077), ('goal', 0.076), ('connections', 0.075), ('insight', 0.075), ('richard', 0.071), ('understanding', 0.07), ('applications', 0.068), ('principle', 0.068), ('org', 0.066), ('canada', 0.066), ('ideas', 0.065), ('modeling', 0.064), ('ro', 0.062), ('motivation', 0.061), ('advantages', 0.061), ('republic', 0.061), ('black', 0.06), ('draw', 0.059), ('graphical', 0.057), ('power', 0.054), ('pos', 0.053), ('tree', 0.053), ('amounts', 0.05), ('aims', 0.05), ('rather', 0.049), ('issues', 0.048), ('vector', 0.048), ('vectors', 0.047), ('optimization', 0.046), ('outputs', 0.046), ('modified', 0.045), ('ac', 0.045), ('external', 0.045), ('ca', 0.045), ('algorithms', 0.045), ('stanford', 0.043), ('researchers', 0.043), ('despite', 0.043), ('develop', 0.041), ('techniques', 0.04), ('learning', 0.04), ('clear', 0.04), ('overview', 0.038), ('real', 0.038), ('detection', 0.036), ('edu', 0.036), ('explore', 0.036), ('include', 0.036), ('entity', 0.036), ('named', 0.036), ('appear', 0.036), ('designed', 0.035), ('christopher', 0.035), ('structured', 0.034), ('advantage', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 69 acl-2012-Deep Learning for NLP (without Magic)

Author: Richard Socher ; Yoshua Bengio ; Christopher D. Manning

Abstract: unkown-abstract

2 0.19326834 104 acl-2012-Graph-based Semi-Supervised Learning Algorithms for NLP

Author: Amar Subramanya ; Partha Pratim Talukdar

Abstract: While labeled data is expensive to prepare, ever increasing amounts of unlabeled linguistic data are becoming widely available. In order to adapt to this phenomenon, several semi-supervised learning (SSL) algorithms, which learn from labeled as well as unlabeled data, have been developed. In a separate line of work, researchers have started to realize that graphs provide a natural way to represent data in a variety of domains. Graph-based SSL algorithms, which bring together these two lines of work, have been shown to outperform the state-ofthe-art in many applications in speech processing, computer vision and NLP. In particular, recent NLP research has successfully used graph-based SSL algorithms for PoS tagging (Subramanya et al., 2010), semantic parsing (Das and Smith, 2011), knowledge acquisition (Talukdar et al., 2008), sentiment analysis (Goldberg and Zhu, 2006) and text categoriza- tion (Subramanya and Bilmes, 2008). Recognizing this promising and emerging area of research, this tutorial focuses on graph-based SSL algorithms (e.g., label propagation methods). The tutorial is intended to be a sequel to the ACL 2008 SSL tutorial, focusing exclusively on graph-based SSL methods and recent advances in this area, which were beyond the scope of the previous tutorial. The tutorial is divided in two parts. In the first part, we will motivate the need for graph-based SSL methods, introduce some standard graph-based SSL algorithms, and discuss connections between these approaches. We will also discuss how linguistic data can be encoded as graphs and show how graph-based algorithms can be scaled to large amounts of data (e.g., web-scale data). Part 2 of the tutorial will focus on how graph-based methods can be used to solve several critical NLP tasks, including basic problems such as PoS tagging, semantic parsing, and more downstream tasks such as text categorization, information acquisition, and 6 Partha Pratim Talukdar Carnegie Mellon University ppt @ cs . cmu . edu sentiment analysis. We will conclude the tutorial with some exciting avenues for future work. Familiarity with semi-supervised learning and graph-based methods will not be assumed, and the necessary background will be provided. Examples from NLP tasks will be used throughout the tutorial to convey the necessary concepts. At the end of this tutorial, the attendee will walk away with the following: • An in-depth knowledge of the current state-oftAhen- ainrt-d dienp graph-based SeS oLf algorithms, taantde- tohfeability to implement them. • The ability to decide on the suitability of graph-based S toSL d meceitdheod osn nfo trh a problem. • Familiarity with different NLP tasks where graph-based wSSitLh m dieftfehorednst h NaLveP Pb teaesnk successfully applied. In addition to the above goals, we hope that this tutorial will better prepare the attendee to conduct exciting research at the intersection of NLP and other emerging areas with natural graph-structured data (e.g., Computation Social Science). Please visit http://graph-ssl.wikidot.com/ for details. References Dipanjan Das and Noah A. Smith. 2011. Semi-supervised frame-semantic parsing for unknown predicates. In Proceedings of the ACL: Human Language Technologies. Andrew B. Goldberg and Xiaojin Zhu. 2006. Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In Proceedings ofthe Workshop on Graph Based Methods for NLP. Amarnag Subramanya and Jeff Bilmes. 2008. Soft-supervised text classification. In EMNLP. Amarnag Subramanya, Slav Petrov, and Fernando Pereira. 2010. Graph-based semi-supervised learning of structured tagging models. In EMNLP. Partha Pratim Talukdar, Joseph Reisinger, Marius Pasca, Deepak Ravichandran, Rahul Bhagat, and Fernando Pereira. 2008. Weakly supervised acquisition of labeled class instances using graph random walks. In EMNLP. Jeju, Republic of Korea,T 8ut Jourliya 2l0 A1b2s.tr ?ac c2t0s1 o2f A ACssLo 2c0ia1t2io,n p faogre C 6o,mputational Linguistics

3 0.16017555 198 acl-2012-Topic Models, Latent Space Models, Sparse Coding, and All That: A Systematic Understanding of Probabilistic Semantic Extraction in Large Corpus

Author: Eric Xing

Abstract: Probabilistic topic models have recently gained much popularity in informational retrieval and related areas. Via such models, one can project high-dimensional objects such as text documents into a low dimensional space where their latent semantics are captured and modeled; can integrate multiple sources of information—to ”share statistical strength” among components of a hierarchical probabilistic model; and can structurally display and classify the otherwise unstructured object collections. However, to many practitioners, how topic models work, what to and not to expect from a topic model, how is it different from and related to classical matrix algebraic techniques such as LSI, NMF in NLP, how to empower topic models to deal with complex scenarios such as multimodal data, contractual text in social media, evolving corpus, or presence of supervision such as labeling and rating, how to make topic modeling computationally tractable even on webscale data, etc., in a principled way, remain unclear. In this tutorial, I will demystify the conceptual, mathematical, and computational issues behind all such problems surrounding the topic models and their applications by presenting a systematic overview of the mathematical foundation of topic modeling, and its connections to a number of related methods popular in other fields such as the LDA, admixture model, mixed membership model, latent space models, and sparse coding. Iwill offer a simple and unifying view of all these techniques under the framework multi-view latent space embedding, and online the roadmap of model extension and algorithmic design to3 ward different applications in IR and NLP. A main theme of this tutorial that tie together a wide range of issues and problems will build on the ”probabilistic graphical model” formalism, a formalism that exploits the conjoined talents of graph theory and probability theory to build complex models out of simpler pieces. Iwill use this formalism as a main aid to discuss both the mathematical underpinnings for the models and the related computational issues in a unified, simplistic, transparent, and actionable fashion. Jeju, Republic of Korea,T 8ut Jourliya 2l0 A1b2s.tr ?ac c2t0s1 o2f A ACssLo 2c0ia1t2io,n p faogre C 3o,mputational Linguistics

4 0.12708808 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

Author: Eric Huang ; Richard Socher ; Christopher Manning ; Andrew Ng

Abstract: Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and global context can also provide useful information for learning word meanings. We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models. 1

5 0.12400395 183 acl-2012-State-of-the-Art Kernels for Natural Language Processing

Author: Alessandro Moschitti

Abstract: unkown-abstract

6 0.11095588 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules

7 0.10800584 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis

8 0.095148534 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation

9 0.082751796 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

10 0.06850481 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

11 0.063312143 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification

12 0.062081218 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

13 0.060679018 115 acl-2012-Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification

14 0.058378067 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors

15 0.057551093 100 acl-2012-Fine Granular Aspect Analysis using Latent Structural Models

16 0.05744632 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

17 0.052181236 56 acl-2012-Computational Approaches to Sentence Completion

18 0.046955965 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

19 0.044902448 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies

20 0.044065136 187 acl-2012-Subgroup Detection in Ideological Discussions

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.133), (1, 0.079), (2, 0.046), (3, -0.094), (4, 0.026), (5, 0.027), (6, 0.012), (7, 0.084), (8, -0.129), (9, -0.06), (10, -0.026), (11, 0.025), (12, 0.041), (13, 0.049), (14, 0.059), (15, 0.15), (16, -0.023), (17, -0.005), (18, -0.112), (19, 0.075), (20, 0.091), (21, 0.045), (22, -0.161), (23, 0.015), (24, 0.083), (25, 0.207), (26, 0.158), (27, -0.11), (28, 0.0), (29, 0.146), (30, 0.044), (31, -0.212), (32, 0.139), (33, -0.337), (34, -0.06), (35, 0.179), (36, 0.09), (37, 0.102), (38, -0.034), (39, 0.001), (40, 0.036), (41, 0.053), (42, 0.059), (43, 0.054), (44, 0.004), (45, -0.026), (46, 0.063), (47, -0.035), (48, 0.062), (49, -0.05)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97878784 69 acl-2012-Deep Learning for NLP (without Magic)

Author: Richard Socher ; Yoshua Bengio ; Christopher D. Manning

Abstract: unkown-abstract

2 0.88748246 104 acl-2012-Graph-based Semi-Supervised Learning Algorithms for NLP

Author: Amar Subramanya ; Partha Pratim Talukdar

3 0.58885092 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

Author: Inderjeet Mani ; James Pustejovsky

Abstract: unkown-abstract

4 0.54615569 198 acl-2012-Topic Models, Latent Space Models, Sparse Coding, and All That: A Systematic Understanding of Probabilistic Semantic Extraction in Large Corpus

Author: Eric Xing

5 0.40579265 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis

Author: Rada Mihalcea ; Carmen Banea ; Janyce Wiebe

Abstract: Subjectivity and sentiment analysis focuses on the automatic identification of private states, such as opinions, emotions, sentiments, evaluations, beliefs, and speculations in natural language. While subjectivity classification labels text as either subjective or objective, sentiment classification adds an additional level of granularity, by further classifying subjective text as either positive, negative or neutral. While much of the research work in this area has been applied to English, research on other languages is growing, including Japanese, Chinese, German, Spanish, Romanian. While most of the researchers in the field are familiar with the methods applied on English, few of them have closely looked at the original research carried out in other languages. For example, in languages such as Chinese, researchers have been looking at the ability of characters to carry sentiment information (Ku et al., 2005; Xiang, 2011). In Romanian, due to markers of politeness and additional verbal modes embedded in the language, experiments have hinted that subjectivity detection may be easier to achieve (Banea et al., 2008). These additional sources ofinformation may not be available across all languages, yet, various articles have pointed out that by investigating a synergistic approach for detecting subjectivity and sentiment in multiple languages at the same time, improvements can be achieved not only in other languages, but in English as well. The development and interest in these methods is also highly motivated by the fact that only 27% of Internet users speak English (www.internetworldstats.com/stats.htm, 4 . unt . edu wiebe @ c s . pitt . edu Oct 11, 2011), and that number diminishes further every year, as more people across the globe gain Internet access. The aim of this tutorial is to familiarize the attendees with the subjectivity and sentiment research carried out on languages other than English in order to enable and promote crossfertilization. Specifically, we will review work along three main directions. First, we will present methods where the resources and tools have been specifically developed for a given target language. In this category, we will also briefly overview the main methods that have been proposed for English, but which can be easily ported to other languages. Second, we will describe cross-lingual approaches, including several methods that have been proposed to leverage on the resources and tools available in English by using cross-lingual projections. Finally, third, we will show how the expression of opinions and polarity pervades language boundaries, and thus methods that holistically explore multiple languages at the same time can be effectively considered. References C. Banea, R. Mihalcea, and J. Wiebe. 2008. A Bootstrapping method for building subjectivity lexicons for languages with scarce resources. In Proceedings of LREC 2008, Marrakech, Morocco. L. W. Ku, T. H. Wu, L. Y. Lee, and H. H. Chen. 2005. Construction of an Evaluation Corpus for Opinion Extraction. In Proceedings of NTCIR-5, Tokyo, Japan. L. Xiang. 2011. Ideogram Based Chinese Sentiment Word Orientation Computation. Computing Research Repository, page 4, October. Jeju, Republic of Korea,T 8ut Jourliya 2l0 A1b2s.tr ?ac c2t0s1 o2f A ACssLo 2c0ia1t2io,n p faogre C 4o,mputational Linguistics

6 0.39445055 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

7 0.38199696 183 acl-2012-State-of-the-Art Kernels for Natural Language Processing

8 0.29275191 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

9 0.26275578 42 acl-2012-Bootstrapping via Graph Propagation

10 0.24204485 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules

11 0.2313807 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench

12 0.22373204 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation

13 0.20873766 92 acl-2012-FLOW: A First-Language-Oriented Writing Assistant System

14 0.20365869 56 acl-2012-Computational Approaches to Sentence Completion

15 0.20313028 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model

16 0.20130114 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors

17 0.19354427 190 acl-2012-Syntactic Stylometry for Deception Detection

18 0.18478204 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

19 0.17973092 96 acl-2012-Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection

20 0.17926796 115 acl-2012-Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(26, 0.026), (28, 0.011), (30, 0.014), (37, 0.02), (39, 0.049), (59, 0.607), (74, 0.013), (90, 0.095), (92, 0.032), (99, 0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.89555794 69 acl-2012-Deep Learning for NLP (without Magic)

Author: Richard Socher ; Yoshua Bengio ; Christopher D. Manning

Abstract: unkown-abstract

2 0.77130842 59 acl-2012-Corpus-based Interpretation of Instructions in Virtual Environments

Author: Luciana Benotti ; Martin Villalba ; Tessa Lau ; Julian Cerruti

Abstract: Previous approaches to instruction interpretation have required either extensive domain adaptation or manually annotated corpora. This paper presents a novel approach to instruction interpretation that leverages a large amount of unannotated, easy-to-collect data from humans interacting with a virtual world. We compare several algorithms for automatically segmenting and discretizing this data into (utterance, reaction) pairs and training a classifier to predict reactions given the next utterance. Our empirical analysis shows that the best algorithm achieves 70% accuracy on this task, with no manual annotation required. 1 Introduction and motivation Mapping instructions into automatically executable actions would enable the creation of natural lan- , guage interfaces to many applications (Lau et al., 2009; Branavan et al., 2009; Orkin and Roy, 2009). In this paper, we focus on the task of navigation and manipulation of a virtual environment (Vogel and Jurafsky, 2010; Chen and Mooney, 2011). Current symbolic approaches to the problem are brittle to the natural language variation present in instructions and require intensive rule authoring to be fit for a new task (Dzikovska et al., 2008). Current statistical approaches require extensive manual annotations of the corpora used for training (MacMahon et al., 2006; Matuszek et al., 2010; Gorniak and Roy, 2007; Rieser and Lemon, 2010). Manual annotation and rule authoring by natural language engineering experts are bottlenecks for developing conversational systems for new domains. 181 t e s s al au @ us . ibm . com, j ce rrut i ar .ibm . com @ This paper proposes a fully automated approach to interpreting natural language instructions to complete a task in a virtual world based on unsupervised recordings of human-human interactions perform- ing that task in that virtual world. Given unannotated corpora collected from humans following other humans’ instructions, our system automatically segments the corpus into labeled training data for a classification algorithm. Our interpretation algorithm is based on the observation that similar instructions uttered in similar contexts should lead to similar actions being taken in the virtual world. Given a previously unseen instruction, our system outputs actions that can be directly executed in the virtual world, based on what humans did when given similar instructions in the past. 2 Corpora situated in virtual worlds Our environment consists of six virtual worlds designed for the natural language generation shared task known as the GIVE Challenge (Koller et al., 2010), where a pair of partners must collaborate to solve a task in a 3D space (Figure 1). The “instruction follower” (IF) can move around in the virtual world, but has no knowledge of the task. The “instruction giver” (IG) types instructions to the IF in order to guide him to accomplish the task. Each corpus contains the IF’s actions and position recorded every 200 milliseconds, as well as the IG’s instruc- tions with their timestamps. We used two corpora for our experiments. The Cm corpus (Gargett et al., 2010) contains instructions given by multiple people, consisting of 37 games spanning 2163 instructions over 8: 17 hs. The Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi1c 8s1–186, Figure 1: A screenshot of a virtual world. The world consists of interconnecting hallways, rooms and objects Cs corpus (Benotti and Denis, 2011), gathered using a single IG, is composed of 63 games and 3417 in- structions, and was recorded in a span of 6:09 hs. It took less than 15 hours to collect the corpora through the web and the subjects reported that the experiment was fun. While the environment is restricted, people describe the same route and the same objects in extremely different ways. Below are some examples of instructions from our corpus all given for the same route shown in Figure 1. 1) out 2) walk down the passage 3) nowgo [sic] to the pink room 4) back to the room with the plant 5) Go through the door on the left 6) go through opening with yellow wall paper People describe routes using landmarks (4) or specific actions (2). They may describe the same object differently (5 vs 6). Instructions also differ in their scope (3 vs 1). Thus, even ignoring spelling and grammatical errors, navigation instructions contain considerable variation which makes interpreting them a challenging problem. 3 Learning from previous interpretations Our algorithm consists of two phases: annotation and interpretation. Annotation is performed only once and consists of automatically associating each IG instruction to an IF reaction. Interpretation is performed every time the system receives an instruc182 tion and consists of predicting an appropriate reaction given reactions observed in the corpus. Our method is based on the assumption that a reaction captures the semantics of the instruction that caused it. Therefore, if two utterances result in the same reaction, they are paraphrases of each other, and similar utterances should generate the same reaction. This approach enables us to predict reactions for previously-unseen instructions. 3.1 Annotation phase The key challenge in learning from massive amounts of easily-collected data is to automatically annotate an unannotated corpus. Our annotation method consists of two parts: first, segmenting a low-level interaction trace into utterances and corresponding reactions, and second, discretizing those reactions into canonical action sequences. Segmentation enables our algorithm to learn from traces of IFs interacting directly with a virtual world. Since the IF can move freely in the virtual world, his actions are a stream of continuous behavior. Segmentation divides these traces into reactions that follow from each utterance of the IG. Consider the following example starting at the situation shown in Figure 1: IG(1): go through the yellow opening IF(2): [walks out of the room] IF(3): [turns left at the intersection] IF(4): [enters the room with the sofa] IG(5): stop It is not clear whether the IF is doing h3, 4i because h neo tis c reacting htoe r1 t or Fbec isadu soei hge h 3is, being proactive. While one could manually annotate this data to remove extraneous actions, our goal is to develop automated solutions that enable learning from massive amounts of data. We decided to approach this problem by experimenting with two alternative formal definitions: 1) a strict definition that considers the maximum reaction according to the IF behavior, and 2) a loose defini- tion based on the empirical observation that, in situated interaction, most instructions are constrained by the current visually perceived affordances (Gibson, 1979; Stoia et al., 2006). We formally define behavior segmentation (Bhv) as follows. A reaction rk to an instruction uk begins right after the instruction uk is uttered and ends right before the next instruction uk+1 is uttered. In the example, instruction 1corresponds to h2, 3, 4i . We formally d inefsitnrue visibility segmentation (Vis) as f Wolelows. A reaction rk to an instruction uk begins right after the instruction uk is uttered and ends right before the next instruction uk+1 is uttered or right after the IF leaves the area visible at 360◦ from where uk was uttered. In the example, instruction 1’s reaction would be limited to h2i because the intersection is nwootu vldisi bbele l ifmroimte dw htoer he2 tihe b eicnasutrsuec ttihoen was suetctetiroend. The Bhv and Vis methods define how to segment an interaction trace into utterances and their corresponding reactions. However, users frequently perform noisy behavior that is irrelevant to the goal of the task. For example, after hearing an instruction, an IF might go into the wrong room, realize the error, and leave the room. A reaction should not in- clude such irrelevant actions. In addition, IFs may accomplish the same goal using different behaviors: two different IFs may interpret “go to the pink room” by following different paths to the same destination. We would like to be able to generalize both reactions into one canonical reaction. As a result, our approach discretizes reactions into higher-level action sequences with less noise and less variation. Our discretization algorithm uses an automated planner and a planning representation of the task. This planning representation includes: (1) the task goal, (2) the actions which can be taken in the virtual world, and (3) the current state of the virtual world. Using the planning representation, the planner calculates an optimal path between the starting and ending states of the reaction, eliminating all unnecessary actions. While we use the classical planner FF (Hoffmann, 2003), our technique could also work with classical planning (Nau et al., 2004) or other techniques such as probabilistic planning (Bonet and Geffner, 2005). It is also not dependent on a particular discretization of the world in terms of actions. Now we are ready to define canonical reaction ck formally. Let Sk be the state of the virtual world when instruction uk was uttered, Sk+1 be the state of the world where the reaction ends (as defined by Bhv or Vis segmentation), and D be the planning domain representation of the virtual world. The canonical reaction to uk is defined as the sequence of actions 183 returned by the planner with Sk as initial state, Sk+1 as goal state and D as planning domain. 3.2 Interpretation phase The annotation phase results in a collection of (uk, ck) pairs. The interpretation phase uses these pairs to interpret new utterances in three steps. First, we filter the set of pairs into those whose reactions can be directly executed from the current IF position. Second, we group the filtered pairs according to their reactions. Third, we select the group with utterances most similar to the new utterance, and output that group’s reaction. Figure 2 shows the output of the first two steps: three groups of pairs whose reactions can all be executed from the IF’s current position. Figure 2: Utterance groups for this situation. Colored arrows show the reaction associated with each group. We treat the third step, selecting the most similar group for a new utterance, as a classification problem. We compare three different classification methods. One method uses nearest-neighbor classification with three different similarity metrics: Jaccard and Overlap coefficients (both of which measure the degree of overlap between two sets, differing only in the normalization of the final value (Nikravesh et al., 2005)), and Levenshtein Distance (a string met- ric for measuring the amount of differences between two sequences of words (Levenshtein, 1966)). Our second classification method employs a strategy in which we considered each group as a set of possible machine translations of our utterance, using the BLEU measure (Papineni et al., 2002) to select which group could be considered the best translation of our utterance. Finally, we trained an SVM classifier (Cortes and Vapnik, 1995) using the unigrams Corpus Cm Corpus Cs Algorithm Bhv Vis Bhv Vis Jaccard47%54%54%70% Overlap BLEU SVM Levenshtein 43% 44% 33% 21% 53% 52% 29% 20% 45% 54% 45% 8% 60% 50% 29% 17% Table 1: Accuracy comparison between Cm and Cs for Bhv and Vis segmentation of each paraphrase and the position of the IF as features, and setting their group as the output class using a libSVM wrapper (Chang and Lin, 2011). When the system misinterprets an instruction we use a similar approach to what people do in order to overcome misunderstandings. If the system executes an incorrect reaction, the IG can tell the system to cancel its current interpretation and try again using a paraphrase, selecting a different reaction. 4 Evaluation For the evaluation phase, we annotated both the Cm and Cs corpora entirely, and then we split them in an 80/20 proportion; the first 80% of data collected in each virtual world was used for training, while the remaining 20% was used for testing. For each pair (uk, ck) in the testing set, we used our algorithm to predict the reaction to the selected utterance, and then compared this result against the automatically annotated reaction. Table 1 shows the results. Comparing the Bhv and Vis segmentation strategies, Vis tends to obtain better results than Bhv. In addition, accuracy on the Cs corpus was generally higher than Cm. Given that Cs contained only one IG, we believe this led to less variability in the instructions and less noise in the training data. We evaluated the impact of user corrections by simulating them using the existing corpus. In case of a wrong response, the algorithm receives a second utterance with the same reaction (a paraphrase of the previous one). Then the new utterance is tested over the same set of possible groups, except for the one which was returned before. If the correct reaction is not predicted after four tries, or there are no utterances with the same reaction, the predictions are registered as wrong. To measure the effects of user corrections vs. without, we used a different evalu184 ation process for this algorithm: first, we split the corpus in a 50/50 proportion, and then we moved correctly predicted utterances from the testing set towards training, until either there was nothing more to learn or the training set reached 80% of the entire corpus size. As expected, user corrections significantly improve accuracy, as shown in Figure 3. The worst algorithm’s results improve linearly with each try, while the best ones behave asymptotically, barely improving after the second try. The best algorithm reaches 92% with just one correction from the IG. 5 Discussion and future work We presented an approach to instruction interpretation which learns from non-annotated logs of human behavior. Our empirical analysis shows that our best algorithm achieves 70% accuracy on this task, with no manual annotation required. When corrections are added, accuracy goes up to 92% for just one correction. We consider our results promising since state of the art semi-unsupervised approaches to instruction interpretation (Chen and Mooney, 2011) reports a 55% accuracy on manually segmented data. We plan to compare our system’s performance against human performance in comparable situations. Our informal observations of the GIVE corpus indicate that humans often follow instructions incorrectly, so our automated system’s performance may be on par with human performance. Although we have presented our approach in the context of 3D virtual worlds, we believe our technique is also applicable to other domains such as the web, video games, or Human Robot Interaction. Figure 3: Accuracy values with corrections over Cs References Luciana Benotti and Alexandre Denis. 2011. CL system: Giving instructions by corpus based selection. In Proceedings of the Generation Challenges Session at the 13th European Workshop on Natural Language Generation, pages 296–301, Nancy, France, September. Association for Computational Linguistics. Blai Bonet and H ´ector Geffner. 2005. mGPT: a probabilistic planner based on heuristic search. Journal of Artificial Intelligence Research, 24:933–944. S.R.K. Branavan, Harr Chen, Luke Zettlemoyer, and Regina Barzilay. 2009. Reinforcement learning for mapping instructions to actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 82–90, Suntec, Singapore, August. Association for Computational Linguistics. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27: 1– 27:27. Software available at http : / /www . cs ie . ntu .edu .tw/ ˜ c j l in/ l ibsvm. David L. Chen and Raymond J. Mooney. 2011. Learning to interpret natural language navigation instructions from observations. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI2011), pages 859–865, August. Corinna Cortes and Vladimir Vapnik. 1995. Supportvector networks. Machine Learning, 20:273–297. Myroslava O. Dzikovska, James F. Allen, and Mary D. Swift. 2008. Linking semantic and knowledge representations in a multi-domain dialogue system. Journal of Logic and Computation, 18:405–430, June. Andrew Gargett, Konstantina Garoufi, Alexander Koller, and Kristina Striegnitz. 2010. The GIVE-2 corpus of giving instructions in virtual environments. In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC), Malta. James J. Gibson. 1979. The Ecological Approach to Visual Perception, volume 40. Houghton Mifflin. Peter Gorniak and Deb Roy. 2007. Situated language understanding as filtering perceived affordances. Cognitive Science, 3 1(2): 197–231. J o¨rg Hoffmann. 2003. The Metric-FF planning system: Translating ”ignoring delete lists” to numeric state variables. Journal of Artificial Intelligence Research (JAIR), 20:291–341. Alexander Koller, Kristina Striegnitz, Andrew Gargett, Donna Byron, Justine Cassell, Robert Dale, Johanna Moore, and Jon Oberlander. 2010. Report on the second challenge on generating instructions in virtual environments (GIVE-2). In Proceedings of the 6th In185 ternational Natural Language Generation Conference (INLG), Dublin. Tessa Lau, Clemens Drews, and Jeffrey Nichols. 2009. Interpreting written how-to instructions. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, pages 1433–1438, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Technical Report 8. Matt MacMahon, Brian Stankiewicz, and Benjamin Kuipers. 2006. Walk the talk: connecting language, knowledge, and action in route instructions. In Proceedings of the 21st National Conference on Artifi- cial Intelligence - Volume 2, pages 1475–1482. AAAI Press. Cynthia Matuszek, Dieter Fox, and Karl Koscher. 2010. Following directions using statistical machine translation. In Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction, HRI ’ 10, pages 251–258, New York, NY, USA. ACM. Dana Nau, Malik Ghallab, and Paolo Traverso. 2004. Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., California, USA. Masoud Nikravesh, Tomohiro Takagi, Masanori Tajima, Akiyoshi Shinmura, Ryosuke Ohgaya, Koji Taniguchi, Kazuyosi Kawahara, Kouta Fukano, and Akiko Aizawa. 2005. Soft computing for perception-based decision processing and analysis: Web-based BISCDSS. In Masoud Nikravesh, Lotfi Zadeh, and Janusz Kacprzyk, editors, Soft Computing for Information Processing and Analysis, volume 164 of Studies in Fuzziness and Soft Computing, chapter 4, pages 93– 188. Springer Berlin / Heidelberg. Jeff Orkin and Deb Roy. 2009. Automatic learning and generation of social behavior from collective human gameplay. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent SystemsVolume 1, volume 1, pages 385–392. International Foundation for Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 3 11–3 18, Stroudsburg, PA, USA. Association for Computational Linguistics. Verena Rieser and Oliver Lemon. 2010. Learning human multimodal dialogue strategies. Natural Language Engineering, 16:3–23. Laura Stoia, Donna K. Byron, Darla Magdalene Shockley, and Eric Fosler-Lussier. 2006. Sentence planning for realtime navigational instructions. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, NAACLShort ’06, pages 157–160, Stroudsburg, PA, USA. Association for Computational Linguistics. Adam Vogel and Dan Jurafsky. 2010. Learning to follow navigational directions. In Proceedings ofthe 48th Annual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 806–814, Stroudsburg, PA, USA. Association for Computational Linguistics. 186

3 0.51929659 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors

Author: Malte Nuhn ; Arne Mauser ; Hermann Ney

Abstract: In this paper we show how to train statistical machine translation systems on reallife tasks using only non-parallel monolingual data from two languages. We present a modification of the method shown in (Ravi and Knight, 2011) that is scalable to vocabulary sizes of several thousand words. On the task shown in (Ravi and Knight, 2011) we obtain better results with only 5% of the computational effort when running our method with an n-gram language model. The efficiency improvement of our method allows us to run experiments with vocabulary sizes of around 5,000 words, such as a non-parallel version of the VERBMOBIL corpus. We also report results using data from the monolingual French and English GIGAWORD corpora.

4 0.41189992 104 acl-2012-Graph-based Semi-Supervised Learning Algorithms for NLP

Author: Amar Subramanya ; Partha Pratim Talukdar

5 0.40723631 198 acl-2012-Topic Models, Latent Space Models, Sparse Coding, and All That: A Systematic Understanding of Probabilistic Semantic Extraction in Large Corpus

Author: Eric Xing

6 0.35372436 183 acl-2012-State-of-the-Art Kernels for Natural Language Processing

7 0.29193199 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

8 0.28605711 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

9 0.25703374 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

10 0.25454327 151 acl-2012-Multilingual Subjectivity and Sentiment Analysis

11 0.25072458 24 acl-2012-A Web-based Evaluation Framework for Spatial Instruction-Giving Systems

12 0.24513023 129 acl-2012-Learning High-Level Planning from Text

13 0.24466158 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content