acl acl2012 acl2012-93 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Chen
Abstract: Learning a semantic lexicon is often an important first step in building a system that learns to interpret the meaning of natural language. It is especially important in language grounding where the training data usually consist of language paired with an ambiguous perceptual context. Recent work by Chen and Mooney (201 1) introduced a lexicon learning method that deals with ambiguous relational data by taking intersections of graphs. While the algorithm produced good lexicons for the task of learning to interpret navigation instructions, it only works in batch settings and does not scale well to large datasets. In this paper we introduce a new online algorithm that is an order of magnitude faster and surpasses the stateof-the-art results. We show that by changing the grammar of the formal meaning represen- . tation language and training on additional data collected from Amazon’s Mechanical Turk we can further improve the results. We also include experimental results on a Chinese translation of the training data to demonstrate the generality of our approach.
Reference: text
sentIndex sentText sentNum sentScore
1 408 Austin, TX 78701, USA dlcc @ c s Abstract Learning a semantic lexicon is often an important first step in building a system that learns to interpret the meaning of natural language. [sent-3, score-0.369]
2 Recent work by Chen and Mooney (201 1) introduced a lexicon learning method that deals with ambiguous relational data by taking intersections of graphs. [sent-5, score-0.385]
3 While the algorithm produced good lexicons for the task of learning to interpret navigation instructions, it only works in batch settings and does not scale well to large datasets. [sent-6, score-0.697]
4 More recently, there has been work on learning from ambiguous supervision where a set of potential sentence meanings are given, only one (or a small subset) of which are correct (Chen and Mooney, 2008; Liang et al. [sent-13, score-0.255]
5 Building a lexicon of the formal meaning representations of words and phrases, either implicitly or explicitly, is usually an important step in inferring the meanings of entire sentences. [sent-17, score-0.358]
6 In particular, Chen and Mooney (201 1) first learned a lexicon to help them resolve ambiguous supervision of relational data in which the number of choices is exponential. [sent-18, score-0.347]
7 Their lexicon learning algorithm finds the common connected subgraph that occurs with a word by taking intersections of the graphs that represent the different contexts in which the word appears. [sent-20, score-0.633]
8 While the algorithm produced a good lexicon for their application of learning to interpret navigation instructions, it only works in batch settings and does not scale well to large datasets. [sent-21, score-0.889]
9 However, constructing such semantic annotations can be algorithm that is an order of magnitude faster and also produces better results on their navigation task. [sent-26, score-0.696]
10 In addition to the new lexicon learning algorithm, we also look at modifying the meaning representation grammar (MRG) for their formal semantic language. [sent-27, score-0.422]
11 By using a MRG that correlates better to the structure of natural language, we further improve the performance on the navigation task. [sent-28, score-0.511]
12 2 Background A common way to learn a lexicon across many different contexts is to find the common parts ofthe formal representations associated with different occurrences of the same words or phrases (Siskind, 1996). [sent-33, score-0.234]
13 For graphical representations, this involves finding the common subgraph between multiple graphs (Thompson and Mooney, 2003; Chen and Mooney, 2011). [sent-34, score-0.219]
14 In this section we review the lexicon learning algorithm introduced by Chen and Mooney (201 1) as well as the overall task they designed to test semantic understanding of navigation instructions. [sent-35, score-0.853]
15 1 Navigation Task The goal of the navigation task is to build a system that can understand free-form natural-language instructions and follow them to move to the intended destination (MacMahon et al. [sent-37, score-0.758]
16 Chen and Mooney (201 1) defined a learning task in which the only supervision the system receives is in the form of observing how humans behave when following sample navigation instructions in a virtual world. [sent-41, score-0.891]
17 turn your back against the wall and walk to the couch), Chen and Mooney first use a set of rules to automatically construct the space of reasonable plans using the action trace and knowledge about the world. [sent-51, score-0.281]
18 The space is represented compactly using a graph as shown in 431 Figure 1: Examples of landmarks plans constructed by Chen and Mooney (2011) and how they computed the intersection of two graphs. [sent-52, score-0.317]
19 Given that these landmarks plans contain a lot of extraneous details, Chen and Mooney learn a lexicon and use it to identify and remove the irrelevant parts of the plans. [sent-55, score-0.427]
20 The remaining refined landmarks plans are then treated as supervised training data for a semantic-parser learner, KRISP (Kate and Mooney, 2006). [sent-57, score-0.266]
21 Once a semantic parser is trained, it can be used at test time to transform novel instructions into formal navigation plans which are then carried out by a virtual robot (MacMahon et al. [sent-58, score-1.096]
22 2 Lexicon Learning The central component of the system is the lexicon learning process which associates words and short phrases (n-grams) to their meanings (connected graphs). [sent-61, score-0.292]
23 To learn the meaning of an n-gram w, Chen and Mooney first collect all navigation plans g that co-occur with w. [sent-62, score-0.73]
24 They use the term intersection to mean a maximal common subgraph (i. [sent-65, score-0.223]
25 it is not a subgraph of any other common subgraphs). [sent-67, score-0.17]
26 In this case, they bias toward finding large connected components by greedily removing the largest common connected subgraph from both graphs until the two graphs have no overlapping nodes. [sent-69, score-0.386]
27 Moreover, reducing the beam size could also hurt the quality of the lexicon learned. [sent-78, score-0.192]
28 In this section, we present another lexicon learning algorithm that is much faster and works in an online setting. [sent-79, score-0.343]
29 Using this constraint, we generate all the potential small connected subgraphs for each navigation plan in the training examples and discard the original graph. [sent-82, score-0.739]
30 As we encounter each new training example which consists of a written navigation instruction 432 Algorithm 1 SUBGRAPHGENERATIONONLINE LEXICON LEARNING (SGOLL) input Asequenceofnavigationinstructions and the corresponding navigation plans (e1,p1), . [sent-84, score-1.37]
31 From the corresponding navigation plan, we find all connected subgraphs of size less than or equal to m. [sent-88, score-0.708]
32 We then update the cooccurrence counts between all the n-grams w and all the connected subgraphs g. [sent-89, score-0.268]
33 We also update the counts of how many examples we have encountered so far and counts of the n-grams w and subgraphs g. [sent-90, score-0.245]
34 At any given time, we can compute a lexicon using these various counts. [sent-91, score-0.192]
35 Specifically, for each n-gram w, we look at all the subgraphs g that co- occurred with it, and compute a score for the pair (w, g). [sent-92, score-0.17]
36 This is to prevent rarely seen n-grams from receiving high scores in our lexicon simply due to their sparsity. [sent-96, score-0.192]
37 4, maximum subgraph size m = 3, and minimum support minSup = 10. [sent-98, score-0.17]
38 It should be noted that SGOLL can also become computationally intractable if the sizes of the navigations plans are large or if we set the maximum subgraph size m to a large number. [sent-99, score-0.336]
39 This allows us to determine iftwo graphs are identical in constant time and also lets us use a hash table to quickly update the co-occurrence and subgraph counts. [sent-103, score-0.254]
40 Thus, even given a large number of subgraphs for each training example, each subgraph can be processed very quickly. [sent-104, score-0.339]
41 1 Changing the Meaning Representation Grammar In addition to introducing a new lexicon learning algorithm, we also made another modification to the original system proposed by Chen and Mooney (201 1). [sent-109, score-0.221]
42 To train a semantic parser using KRISP (Kate and Mooney, 2006), they had to supply a MRG, a context-free grammar, for their formal navigation plan language. [sent-110, score-0.685]
43 For example, the original MRG contained the following production rules for generating an infinite number of travel actions from the root symbol S. [sent-116, score-0.22]
44 4 Collecting Additional Data with Mechanical Turk One of the motivations for studying ambiguous supervision is the potential ease of acquiring large amounts of training data. [sent-120, score-0.186]
45 We validate this claim by collecting additional training data for the navigation domain using Mechanical Turk (Snow et al. [sent-122, score-0.593]
46 There are two types of data we are interested in collecting: natural language navigation instructions and follower data. [sent-124, score-0.867]
47 The first one asks the workers to supply instructions for a randomly generated sequence of actions. [sent-126, score-0.319]
48 The second one asks the workers to try to follow a given navigation instruction in our virtual environment. [sent-127, score-0.819]
49 The latter task is used to generate the corresponding action sequences for instructions collected from the first task. [sent-128, score-0.304]
50 The worker is given a navigation instruction and placed at the starting location. [sent-135, score-0.701]
51 They are asked to follow the navigation instruction as best as they could using the three discrete controls. [sent-136, score-0.691]
52 They could also skip the problem if they did not understand the instruction or if the instruction did not de- scribe a viable route. [sent-137, score-0.302]
53 For each Human Intelligence Task (HIT), we asked the worker to complete 5 follower problems. [sent-138, score-0.177]
54 The instructions used for the follower problems were mainly collected from our Mechanical Turk instructor task with some of the instructions coming from data collected by MacMahon (2007) that was not used by Chen and Mooney (201 1). [sent-141, score-0.71]
55 The instructor task is slightly more involved because we ask the workers to provide new navigation 434 VA#ovingcas. [sent-142, score-0.671]
56 r21k4) Table 1: Statistics about the navigation instruction corpora. [sent-147, score-0.662]
57 The worker is shown a 3D simulation of a randomly generated action sequence between length 1 to 4 and asked to write short, free-form instructions that would lead someone to perform those actions. [sent-150, score-0.305]
58 Workers who have been identified to consistently provide good instructions were allowed to do higher-paying version of the same HITs that paid $0. [sent-155, score-0.248]
59 2 Data Statistics Over a 2-month period we accepted 2,884 follower HITs and 810 instructor HITs from 653 workers. [sent-159, score-0.198]
60 This corresponds to over 14,000 follower traces and 2,400 instructions with most of them consisting of single sentences. [sent-160, score-0.401]
61 For instructions with multiple sentences, we merged all the sentences together and treated it as a single sentence. [sent-161, score-0.218]
62 First, we discarded any instructions that did not have at least 5 follower traces. [sent-165, score-0.356]
63 Then we looked at all the follower traces and discarded any instruction that did not have majority agreement on what the correct path is. [sent-166, score-0.334]
64 Overall, the new corpus has a slightly smaller vocabulary, and each instruction is slightly shorter both in terms of the number of words and the number of actions. [sent-169, score-0.205]
65 5 Experiments We evaluate our new lexicon learning algorithm as well as the other modifications to the navigation system using the same three tasks as Chen and Mooney (201 1). [sent-170, score-0.779]
66 The first task is disambiguating the training data by inferring the correct navigation plans associated with each training sentence. [sent-171, score-0.739]
67 The second task is evaluating the performance of the semantic parsers trained on the disambiguated data. [sent-172, score-0.167]
68 Finally, the third task is to complete the end-to-end navigation task. [sent-174, score-0.511]
69 There are two versions of this task, the complete task uses the original instructions which are several sentences long and the other version uses instructions that have been manually split into single sentences. [sent-175, score-0.436]
70 For the first task, we build a lexicon using ambiguous training data from two maps, and then use the lexicon to produce the best disambiguated semantic meanings for those same data. [sent-178, score-0.677]
71 For all comparisons to the Chen and Mooney results, we use the performance of their refined land- marks plans system which performed the best overall. [sent-180, score-0.166]
72 Moreover, it provides the most direct comparison to our approach since both use a lexicon to refine the landmarks plans. [sent-181, score-0.261]
73 1 Inferring Navigation Plans First, we examine the quality of the refined navigation plans produced using SGOLL’s lexicon. [sent-185, score-0.677]
74 13429 Table 2: Partial parse accuracy of how well each algorithm can infer the gold-standard navigation plans. [sent-189, score-0.558]
75 13473 Table 3: Partial parse accuracy of the semantic parsers trained on the disambiguated navigation plans. [sent-193, score-0.678]
76 precision, recall, and F1 (harmonic mean of precision and recall) of these plans are shown in Table 2. [sent-194, score-0.166]
77 Compared to Chen and Mooney, the plans produced by SGOLL has higher precision and lower recall. [sent-195, score-0.166]
78 2 Training Semantic Parsers Next we look at the performance of the semantic parsers trained on the inferred navigation plans. [sent-198, score-0.67]
79 3 Executing Navigation Plans Finally, we evaluate the system on the end-to-end navigation task. [sent-204, score-0.511]
80 SGOLL outperforms Chen and Mooney’s system on both versions of the navigation task. [sent-208, score-0.511]
81 Finally, augmenting the Table 4: End-to-end navigation task completion rates. [sent-210, score-0.548]
82 training data with additional instructions and follower traces collected from Mechanical Turk produced the best results. [sent-214, score-0.47]
83 The average time (in seconds) it takes for each algorithm to build a lexicon is shown in Table 5. [sent-217, score-0.239]
84 Here SGOLL has a decidedly large advantage over the lexicon learning algorithm from Chen and Mooney, requiring an order of magnitude less time to run. [sent-220, score-0.3]
85 This shows how inefficient it is to perform graph intersection operations and how our online algorithm can more realistically scale to large datasets. [sent-222, score-0.199]
86 6 Discussion We have introduced a novel, online lexicon learning algorithm that is much faster than the one proposed by Chen and Mooney and also performs better on the navigation tasks they devised. [sent-244, score-0.854]
87 Compared to systems that train on supervised semantic annotations, a system that only receives weak, ambiguous training data is expected to have to train on much larger datasets to achieve sim- ilar performance. [sent-246, score-0.182]
88 Moreover, the online nature of SGOLL allows the lexicon to be continually updated while the system is in use. [sent-249, score-0.235]
89 A deployed navigation system can gather new instructions from the user and receive feedback about whether it is performing the correct actions. [sent-250, score-0.729]
90 As new training examples are collected, we can update the corresponding n-gram and subgraph counts without rebuilding the entire lexicon. [sent-251, score-0.272]
91 One thing to note though is that while SGOLL makes the lexicon learning step much faster and scalable, another bottleneck in the overall system is training the semantic parser. [sent-252, score-0.358]
92 Parallel to the work of learning from ambiguous supervision, other recent work has also looked at training semantic parsers from supervision other than logical-form annotations. [sent-277, score-0.342]
93 In this paper we presented a novel online algorithm for building a lexicon from ambiguously supervised relational data. [sent-286, score-0.282]
94 In contrast to the previous approach that computed common subgraphs between different contexts in which an n-gram appeared, we instead focus on small, connected subgraphs and introduce an algorithm, SGOLL, that is an order of magnitude faster. [sent-287, score-0.367]
95 In addition to being more scalable, SGOLL also performed better on the task of interpreting navigation instructions. [sent-288, score-0.511]
96 In addition, we showed that changing the MRG and collecting additional training data from Mechanical Turk further improve the performance of the overall navigation system. [sent-289, score-0.62]
97 Finally, we demonstrated the generality of the system by using it to learn Chinese navigation instructions and achieved similar results. [sent-290, score-0.764]
98 Label ranking under ambiguous supervision for learning semantic correspondences. [sent-299, score-0.258]
99 Learning to interpret natural language navigation instructions from observations. [sent-326, score-0.779]
100 Generative alignment and semantic parsing for learning from ambiguous supervision. [sent-354, score-0.18]
wordName wordTfidf (topN-words)
[('navigation', 0.511), ('mooney', 0.29), ('sgoll', 0.278), ('instructions', 0.218), ('mrg', 0.208), ('lexicon', 0.192), ('subgraph', 0.17), ('plans', 0.166), ('instruction', 0.151), ('chen', 0.15), ('follower', 0.138), ('subgraphs', 0.138), ('macmahon', 0.106), ('act', 0.101), ('travel', 0.1), ('mechanical', 0.097), ('krisp', 0.091), ('intersections', 0.087), ('ion', 0.084), ('supervision', 0.078), ('ambiguous', 0.077), ('semantic', 0.074), ('workers', 0.073), ('meanings', 0.071), ('trave', 0.069), ('landmarks', 0.069), ('turk', 0.064), ('instructor', 0.06), ('connected', 0.059), ('virtual', 0.055), ('kate', 0.055), ('parsers', 0.053), ('intersection', 0.053), ('meaning', 0.053), ('actions', 0.052), ('hit', 0.052), ('num', 0.052), ('collecting', 0.051), ('interpret', 0.05), ('graphs', 0.049), ('action', 0.048), ('algorithm', 0.047), ('traces', 0.045), ('bordes', 0.045), ('minsup', 0.045), ('raymond', 0.045), ('luke', 0.044), ('online', 0.043), ('production', 0.042), ('formal', 0.042), ('goldwasser', 0.041), ('walk', 0.041), ('zettlemoyer', 0.041), ('disambiguated', 0.04), ('ei', 0.039), ('worker', 0.039), ('perceptual', 0.039), ('collected', 0.038), ('completion', 0.037), ('counts', 0.036), ('update', 0.035), ('generality', 0.035), ('end', 0.035), ('hits', 0.035), ('route', 0.035), ('artzi', 0.035), ('borschinger', 0.035), ('fazly', 0.035), ('kollar', 0.035), ('outputlexicon', 0.035), ('shimizu', 0.035), ('sportscasting', 0.035), ('weather', 0.035), ('batch', 0.033), ('clarke', 0.033), ('magnitude', 0.032), ('faster', 0.032), ('look', 0.032), ('training', 0.031), ('pi', 0.031), ('hri', 0.03), ('matuszek', 0.03), ('kwiatkowski', 0.03), ('paid', 0.03), ('parser', 0.03), ('liang', 0.03), ('learning', 0.029), ('grounded', 0.029), ('follow', 0.029), ('graph', 0.029), ('moreover', 0.028), ('supply', 0.028), ('cynthia', 0.028), ('rohit', 0.028), ('siskind', 0.028), ('lu', 0.027), ('changing', 0.027), ('scale', 0.027), ('slightly', 0.027), ('rules', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000012 93 acl-2012-Fast Online Lexicon Learning for Grounded Language Acquisition
Author: David Chen
Abstract: Learning a semantic lexicon is often an important first step in building a system that learns to interpret the meaning of natural language. It is especially important in language grounding where the training data usually consist of language paired with an ambiguous perceptual context. Recent work by Chen and Mooney (201 1) introduced a lexicon learning method that deals with ambiguous relational data by taking intersections of graphs. While the algorithm produced good lexicons for the task of learning to interpret navigation instructions, it only works in batch settings and does not scale well to large datasets. In this paper we introduce a new online algorithm that is an order of magnitude faster and surpasses the stateof-the-art results. We show that by changing the grammar of the formal meaning represen- . tation language and training on additional data collected from Amazon’s Mechanical Turk we can further improve the results. We also include experimental results on a Chinese translation of the training data to demonstrate the generality of our approach.
2 0.42532447 24 acl-2012-A Web-based Evaluation Framework for Spatial Instruction-Giving Systems
Author: Srinivasan Janarthanam ; Oliver Lemon ; Xingkun Liu
Abstract: We demonstrate a web-based environment for development and testing of different pedestrian route instruction-giving systems. The environment contains a City Model, a TTS interface, a game-world, and a user GUI including a simulated street-view. We describe the environment and components, the metrics that can be used for the evaluation of pedestrian route instruction-giving systems, and the shared challenge which is being organised using this environment.
3 0.23706514 59 acl-2012-Corpus-based Interpretation of Instructions in Virtual Environments
Author: Luciana Benotti ; Martin Villalba ; Tessa Lau ; Julian Cerruti
Abstract: Previous approaches to instruction interpretation have required either extensive domain adaptation or manually annotated corpora. This paper presents a novel approach to instruction interpretation that leverages a large amount of unannotated, easy-to-collect data from humans interacting with a virtual world. We compare several algorithms for automatically segmenting and discretizing this data into (utterance, reaction) pairs and training a classifier to predict reactions given the next utterance. Our empirical analysis shows that the best algorithm achieves 70% accuracy on this task, with no manual annotation required. 1 Introduction and motivation Mapping instructions into automatically executable actions would enable the creation of natural lan- , guage interfaces to many applications (Lau et al., 2009; Branavan et al., 2009; Orkin and Roy, 2009). In this paper, we focus on the task of navigation and manipulation of a virtual environment (Vogel and Jurafsky, 2010; Chen and Mooney, 2011). Current symbolic approaches to the problem are brittle to the natural language variation present in instructions and require intensive rule authoring to be fit for a new task (Dzikovska et al., 2008). Current statistical approaches require extensive manual annotations of the corpora used for training (MacMahon et al., 2006; Matuszek et al., 2010; Gorniak and Roy, 2007; Rieser and Lemon, 2010). Manual annotation and rule authoring by natural language engineering experts are bottlenecks for developing conversational systems for new domains. 181 t e s s al au @ us . ibm . com, j ce rrut i ar .ibm . com @ This paper proposes a fully automated approach to interpreting natural language instructions to complete a task in a virtual world based on unsupervised recordings of human-human interactions perform- ing that task in that virtual world. Given unannotated corpora collected from humans following other humans’ instructions, our system automatically segments the corpus into labeled training data for a classification algorithm. Our interpretation algorithm is based on the observation that similar instructions uttered in similar contexts should lead to similar actions being taken in the virtual world. Given a previously unseen instruction, our system outputs actions that can be directly executed in the virtual world, based on what humans did when given similar instructions in the past. 2 Corpora situated in virtual worlds Our environment consists of six virtual worlds designed for the natural language generation shared task known as the GIVE Challenge (Koller et al., 2010), where a pair of partners must collaborate to solve a task in a 3D space (Figure 1). The “instruction follower” (IF) can move around in the virtual world, but has no knowledge of the task. The “instruction giver” (IG) types instructions to the IF in order to guide him to accomplish the task. Each corpus contains the IF’s actions and position recorded every 200 milliseconds, as well as the IG’s instruc- tions with their timestamps. We used two corpora for our experiments. The Cm corpus (Gargett et al., 2010) contains instructions given by multiple people, consisting of 37 games spanning 2163 instructions over 8: 17 hs. The Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi1c 8s1–186, Figure 1: A screenshot of a virtual world. The world consists of interconnecting hallways, rooms and objects Cs corpus (Benotti and Denis, 2011), gathered using a single IG, is composed of 63 games and 3417 in- structions, and was recorded in a span of 6:09 hs. It took less than 15 hours to collect the corpora through the web and the subjects reported that the experiment was fun. While the environment is restricted, people describe the same route and the same objects in extremely different ways. Below are some examples of instructions from our corpus all given for the same route shown in Figure 1. 1) out 2) walk down the passage 3) nowgo [sic] to the pink room 4) back to the room with the plant 5) Go through the door on the left 6) go through opening with yellow wall paper People describe routes using landmarks (4) or specific actions (2). They may describe the same object differently (5 vs 6). Instructions also differ in their scope (3 vs 1). Thus, even ignoring spelling and grammatical errors, navigation instructions contain considerable variation which makes interpreting them a challenging problem. 3 Learning from previous interpretations Our algorithm consists of two phases: annotation and interpretation. Annotation is performed only once and consists of automatically associating each IG instruction to an IF reaction. Interpretation is performed every time the system receives an instruc182 tion and consists of predicting an appropriate reaction given reactions observed in the corpus. Our method is based on the assumption that a reaction captures the semantics of the instruction that caused it. Therefore, if two utterances result in the same reaction, they are paraphrases of each other, and similar utterances should generate the same reaction. This approach enables us to predict reactions for previously-unseen instructions. 3.1 Annotation phase The key challenge in learning from massive amounts of easily-collected data is to automatically annotate an unannotated corpus. Our annotation method consists of two parts: first, segmenting a low-level interaction trace into utterances and corresponding reactions, and second, discretizing those reactions into canonical action sequences. Segmentation enables our algorithm to learn from traces of IFs interacting directly with a virtual world. Since the IF can move freely in the virtual world, his actions are a stream of continuous behavior. Segmentation divides these traces into reactions that follow from each utterance of the IG. Consider the following example starting at the situation shown in Figure 1: IG(1): go through the yellow opening IF(2): [walks out of the room] IF(3): [turns left at the intersection] IF(4): [enters the room with the sofa] IG(5): stop It is not clear whether the IF is doing h3, 4i because h neo tis c reacting htoe r1 t or Fbec isadu soei hge h 3is, being proactive. While one could manually annotate this data to remove extraneous actions, our goal is to develop automated solutions that enable learning from massive amounts of data. We decided to approach this problem by experimenting with two alternative formal definitions: 1) a strict definition that considers the maximum reaction according to the IF behavior, and 2) a loose defini- tion based on the empirical observation that, in situated interaction, most instructions are constrained by the current visually perceived affordances (Gibson, 1979; Stoia et al., 2006). We formally define behavior segmentation (Bhv) as follows. A reaction rk to an instruction uk begins right after the instruction uk is uttered and ends right before the next instruction uk+1 is uttered. In the example, instruction 1corresponds to h2, 3, 4i . We formally d inefsitnrue visibility segmentation (Vis) as f Wolelows. A reaction rk to an instruction uk begins right after the instruction uk is uttered and ends right before the next instruction uk+1 is uttered or right after the IF leaves the area visible at 360◦ from where uk was uttered. In the example, instruction 1’s reaction would be limited to h2i because the intersection is nwootu vldisi bbele l ifmroimte dw htoer he2 tihe b eicnasutrsuec ttihoen was suetctetiroend. The Bhv and Vis methods define how to segment an interaction trace into utterances and their corresponding reactions. However, users frequently perform noisy behavior that is irrelevant to the goal of the task. For example, after hearing an instruction, an IF might go into the wrong room, realize the error, and leave the room. A reaction should not in- clude such irrelevant actions. In addition, IFs may accomplish the same goal using different behaviors: two different IFs may interpret “go to the pink room” by following different paths to the same destination. We would like to be able to generalize both reactions into one canonical reaction. As a result, our approach discretizes reactions into higher-level action sequences with less noise and less variation. Our discretization algorithm uses an automated planner and a planning representation of the task. This planning representation includes: (1) the task goal, (2) the actions which can be taken in the virtual world, and (3) the current state of the virtual world. Using the planning representation, the planner calculates an optimal path between the starting and ending states of the reaction, eliminating all unnecessary actions. While we use the classical planner FF (Hoffmann, 2003), our technique could also work with classical planning (Nau et al., 2004) or other techniques such as probabilistic planning (Bonet and Geffner, 2005). It is also not dependent on a particular discretization of the world in terms of actions. Now we are ready to define canonical reaction ck formally. Let Sk be the state of the virtual world when instruction uk was uttered, Sk+1 be the state of the world where the reaction ends (as defined by Bhv or Vis segmentation), and D be the planning domain representation of the virtual world. The canonical reaction to uk is defined as the sequence of actions 183 returned by the planner with Sk as initial state, Sk+1 as goal state and D as planning domain. 3.2 Interpretation phase The annotation phase results in a collection of (uk, ck) pairs. The interpretation phase uses these pairs to interpret new utterances in three steps. First, we filter the set of pairs into those whose reactions can be directly executed from the current IF position. Second, we group the filtered pairs according to their reactions. Third, we select the group with utterances most similar to the new utterance, and output that group’s reaction. Figure 2 shows the output of the first two steps: three groups of pairs whose reactions can all be executed from the IF’s current position. Figure 2: Utterance groups for this situation. Colored arrows show the reaction associated with each group. We treat the third step, selecting the most similar group for a new utterance, as a classification problem. We compare three different classification methods. One method uses nearest-neighbor classification with three different similarity metrics: Jaccard and Overlap coefficients (both of which measure the degree of overlap between two sets, differing only in the normalization of the final value (Nikravesh et al., 2005)), and Levenshtein Distance (a string met- ric for measuring the amount of differences between two sequences of words (Levenshtein, 1966)). Our second classification method employs a strategy in which we considered each group as a set of possible machine translations of our utterance, using the BLEU measure (Papineni et al., 2002) to select which group could be considered the best translation of our utterance. Finally, we trained an SVM classifier (Cortes and Vapnik, 1995) using the unigrams Corpus Cm Corpus Cs Algorithm Bhv Vis Bhv Vis Jaccard47%54%54%70% Overlap BLEU SVM Levenshtein 43% 44% 33% 21% 53% 52% 29% 20% 45% 54% 45% 8% 60% 50% 29% 17% Table 1: Accuracy comparison between Cm and Cs for Bhv and Vis segmentation of each paraphrase and the position of the IF as features, and setting their group as the output class using a libSVM wrapper (Chang and Lin, 2011). When the system misinterprets an instruction we use a similar approach to what people do in order to overcome misunderstandings. If the system executes an incorrect reaction, the IG can tell the system to cancel its current interpretation and try again using a paraphrase, selecting a different reaction. 4 Evaluation For the evaluation phase, we annotated both the Cm and Cs corpora entirely, and then we split them in an 80/20 proportion; the first 80% of data collected in each virtual world was used for training, while the remaining 20% was used for testing. For each pair (uk, ck) in the testing set, we used our algorithm to predict the reaction to the selected utterance, and then compared this result against the automatically annotated reaction. Table 1 shows the results. Comparing the Bhv and Vis segmentation strategies, Vis tends to obtain better results than Bhv. In addition, accuracy on the Cs corpus was generally higher than Cm. Given that Cs contained only one IG, we believe this led to less variability in the instructions and less noise in the training data. We evaluated the impact of user corrections by simulating them using the existing corpus. In case of a wrong response, the algorithm receives a second utterance with the same reaction (a paraphrase of the previous one). Then the new utterance is tested over the same set of possible groups, except for the one which was returned before. If the correct reaction is not predicted after four tries, or there are no utterances with the same reaction, the predictions are registered as wrong. To measure the effects of user corrections vs. without, we used a different evalu184 ation process for this algorithm: first, we split the corpus in a 50/50 proportion, and then we moved correctly predicted utterances from the testing set towards training, until either there was nothing more to learn or the training set reached 80% of the entire corpus size. As expected, user corrections significantly improve accuracy, as shown in Figure 3. The worst algorithm’s results improve linearly with each try, while the best ones behave asymptotically, barely improving after the second try. The best algorithm reaches 92% with just one correction from the IG. 5 Discussion and future work We presented an approach to instruction interpretation which learns from non-annotated logs of human behavior. Our empirical analysis shows that our best algorithm achieves 70% accuracy on this task, with no manual annotation required. When corrections are added, accuracy goes up to 92% for just one correction. We consider our results promising since state of the art semi-unsupervised approaches to instruction interpretation (Chen and Mooney, 2011) reports a 55% accuracy on manually segmented data. We plan to compare our system’s performance against human performance in comparable situations. Our informal observations of the GIVE corpus indicate that humans often follow instructions incorrectly, so our automated system’s performance may be on par with human performance. Although we have presented our approach in the context of 3D virtual worlds, we believe our technique is also applicable to other domains such as the web, video games, or Human Robot Interaction. Figure 3: Accuracy values with corrections over Cs References Luciana Benotti and Alexandre Denis. 2011. CL system: Giving instructions by corpus based selection. In Proceedings of the Generation Challenges Session at the 13th European Workshop on Natural Language Generation, pages 296–301, Nancy, France, September. Association for Computational Linguistics. Blai Bonet and H ´ector Geffner. 2005. mGPT: a probabilistic planner based on heuristic search. Journal of Artificial Intelligence Research, 24:933–944. S.R.K. Branavan, Harr Chen, Luke Zettlemoyer, and Regina Barzilay. 2009. Reinforcement learning for mapping instructions to actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 82–90, Suntec, Singapore, August. Association for Computational Linguistics. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27: 1– 27:27. Software available at http : / /www . cs ie . ntu .edu .tw/ ˜ c j l in/ l ibsvm. David L. Chen and Raymond J. Mooney. 2011. Learning to interpret natural language navigation instructions from observations. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI2011), pages 859–865, August. Corinna Cortes and Vladimir Vapnik. 1995. Supportvector networks. Machine Learning, 20:273–297. Myroslava O. Dzikovska, James F. Allen, and Mary D. Swift. 2008. Linking semantic and knowledge representations in a multi-domain dialogue system. Journal of Logic and Computation, 18:405–430, June. Andrew Gargett, Konstantina Garoufi, Alexander Koller, and Kristina Striegnitz. 2010. The GIVE-2 corpus of giving instructions in virtual environments. In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC), Malta. James J. Gibson. 1979. The Ecological Approach to Visual Perception, volume 40. Houghton Mifflin. Peter Gorniak and Deb Roy. 2007. Situated language understanding as filtering perceived affordances. Cognitive Science, 3 1(2): 197–231. J o¨rg Hoffmann. 2003. The Metric-FF planning system: Translating ”ignoring delete lists” to numeric state variables. Journal of Artificial Intelligence Research (JAIR), 20:291–341. Alexander Koller, Kristina Striegnitz, Andrew Gargett, Donna Byron, Justine Cassell, Robert Dale, Johanna Moore, and Jon Oberlander. 2010. Report on the second challenge on generating instructions in virtual environments (GIVE-2). In Proceedings of the 6th In185 ternational Natural Language Generation Conference (INLG), Dublin. Tessa Lau, Clemens Drews, and Jeffrey Nichols. 2009. Interpreting written how-to instructions. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, pages 1433–1438, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Technical Report 8. Matt MacMahon, Brian Stankiewicz, and Benjamin Kuipers. 2006. Walk the talk: connecting language, knowledge, and action in route instructions. In Proceedings of the 21st National Conference on Artifi- cial Intelligence - Volume 2, pages 1475–1482. AAAI Press. Cynthia Matuszek, Dieter Fox, and Karl Koscher. 2010. Following directions using statistical machine translation. In Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction, HRI ’ 10, pages 251–258, New York, NY, USA. ACM. Dana Nau, Malik Ghallab, and Paolo Traverso. 2004. Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., California, USA. Masoud Nikravesh, Tomohiro Takagi, Masanori Tajima, Akiyoshi Shinmura, Ryosuke Ohgaya, Koji Taniguchi, Kazuyosi Kawahara, Kouta Fukano, and Akiko Aizawa. 2005. Soft computing for perception-based decision processing and analysis: Web-based BISCDSS. In Masoud Nikravesh, Lotfi Zadeh, and Janusz Kacprzyk, editors, Soft Computing for Information Processing and Analysis, volume 164 of Studies in Fuzziness and Soft Computing, chapter 4, pages 93– 188. Springer Berlin / Heidelberg. Jeff Orkin and Deb Roy. 2009. Automatic learning and generation of social behavior from collective human gameplay. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent SystemsVolume 1, volume 1, pages 385–392. International Foundation for Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 3 11–3 18, Stroudsburg, PA, USA. Association for Computational Linguistics. Verena Rieser and Oliver Lemon. 2010. Learning human multimodal dialogue strategies. Natural Language Engineering, 16:3–23. Laura Stoia, Donna K. Byron, Darla Magdalene Shockley, and Eric Fosler-Lussier. 2006. Sentence planning for realtime navigational instructions. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, NAACLShort ’06, pages 157–160, Stroudsburg, PA, USA. Association for Computational Linguistics. Adam Vogel and Dan Jurafsky. 2010. Learning to follow navigational directions. In Proceedings ofthe 48th Annual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 806–814, Stroudsburg, PA, USA. Association for Computational Linguistics. 186
4 0.088892311 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content
Author: Gregory Druck ; Bo Pang
Abstract: There are a growing number of popular web sites where users submit and review instructions for completing tasks as varied as building a table and baking a pie. In addition to providing their subjective evaluation, reviewers often provide actionable refinements. These refinements clarify, correct, improve, or provide alternatives to the original instructions. However, identifying and reading all relevant reviews is a daunting task for a user. In this paper, we propose a generative model that jointly identifies user-proposed refinements in instruction reviews at multiple granularities, and aligns them to the appropriate steps in the original instructions. Labeled data is not readily available for these tasks, so we focus on the unsupervised setting. In experiments in the recipe domain, our model provides 90. 1% F1 for predicting refinements at the review level, and 77.0% F1 for predicting refinement segments within reviews.
5 0.08440394 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
Author: Xiaodong He ; Li Deng
Abstract: This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset. For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. The proposed method, evaluated on the Europarl German-to-English dataset, leads to a 1.1 BLEU point improvement over a state-of-the-art baseline translation system. In IWSLT 201 1 Benchmark, our system using the proposed method achieves the best Chinese-to-English translation result on the task of translating TED talks.
6 0.080332182 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes
7 0.076489858 170 acl-2012-Robust Conversion of CCG Derivations to Phrase Structure Trees
8 0.071960546 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors
9 0.068471812 129 acl-2012-Learning High-Level Planning from Text
10 0.068180494 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places
11 0.066290505 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
12 0.061697472 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons
13 0.061196461 14 acl-2012-A Joint Model for Discovery of Aspects in Utterances
14 0.058104098 64 acl-2012-Crosslingual Induction of Semantic Roles
15 0.057833806 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
16 0.057344254 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
17 0.056626614 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
18 0.055595115 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
19 0.054780226 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction
20 0.054683086 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model
topicId topicWeight
[(0, -0.177), (1, 0.028), (2, -0.03), (3, -0.006), (4, -0.013), (5, 0.085), (6, 0.028), (7, 0.064), (8, -0.022), (9, 0.061), (10, -0.011), (11, 0.169), (12, -0.267), (13, 0.175), (14, -0.209), (15, -0.0), (16, -0.329), (17, 0.071), (18, -0.027), (19, 0.118), (20, 0.016), (21, -0.035), (22, -0.036), (23, 0.38), (24, 0.105), (25, -0.016), (26, -0.118), (27, 0.118), (28, -0.077), (29, 0.082), (30, -0.062), (31, 0.104), (32, -0.039), (33, 0.01), (34, -0.022), (35, -0.011), (36, 0.101), (37, -0.003), (38, 0.003), (39, 0.06), (40, 0.057), (41, -0.08), (42, -0.001), (43, 0.069), (44, -0.01), (45, -0.01), (46, 0.075), (47, -0.007), (48, -0.005), (49, 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.93024886 93 acl-2012-Fast Online Lexicon Learning for Grounded Language Acquisition
Author: David Chen
Abstract: Learning a semantic lexicon is often an important first step in building a system that learns to interpret the meaning of natural language. It is especially important in language grounding where the training data usually consist of language paired with an ambiguous perceptual context. Recent work by Chen and Mooney (201 1) introduced a lexicon learning method that deals with ambiguous relational data by taking intersections of graphs. While the algorithm produced good lexicons for the task of learning to interpret navigation instructions, it only works in batch settings and does not scale well to large datasets. In this paper we introduce a new online algorithm that is an order of magnitude faster and surpasses the stateof-the-art results. We show that by changing the grammar of the formal meaning represen- . tation language and training on additional data collected from Amazon’s Mechanical Turk we can further improve the results. We also include experimental results on a Chinese translation of the training data to demonstrate the generality of our approach.
2 0.87508595 24 acl-2012-A Web-based Evaluation Framework for Spatial Instruction-Giving Systems
Author: Srinivasan Janarthanam ; Oliver Lemon ; Xingkun Liu
Abstract: We demonstrate a web-based environment for development and testing of different pedestrian route instruction-giving systems. The environment contains a City Model, a TTS interface, a game-world, and a user GUI including a simulated street-view. We describe the environment and components, the metrics that can be used for the evaluation of pedestrian route instruction-giving systems, and the shared challenge which is being organised using this environment.
3 0.83983266 59 acl-2012-Corpus-based Interpretation of Instructions in Virtual Environments
Author: Luciana Benotti ; Martin Villalba ; Tessa Lau ; Julian Cerruti
Abstract: Previous approaches to instruction interpretation have required either extensive domain adaptation or manually annotated corpora. This paper presents a novel approach to instruction interpretation that leverages a large amount of unannotated, easy-to-collect data from humans interacting with a virtual world. We compare several algorithms for automatically segmenting and discretizing this data into (utterance, reaction) pairs and training a classifier to predict reactions given the next utterance. Our empirical analysis shows that the best algorithm achieves 70% accuracy on this task, with no manual annotation required. 1 Introduction and motivation Mapping instructions into automatically executable actions would enable the creation of natural lan- , guage interfaces to many applications (Lau et al., 2009; Branavan et al., 2009; Orkin and Roy, 2009). In this paper, we focus on the task of navigation and manipulation of a virtual environment (Vogel and Jurafsky, 2010; Chen and Mooney, 2011). Current symbolic approaches to the problem are brittle to the natural language variation present in instructions and require intensive rule authoring to be fit for a new task (Dzikovska et al., 2008). Current statistical approaches require extensive manual annotations of the corpora used for training (MacMahon et al., 2006; Matuszek et al., 2010; Gorniak and Roy, 2007; Rieser and Lemon, 2010). Manual annotation and rule authoring by natural language engineering experts are bottlenecks for developing conversational systems for new domains. 181 t e s s al au @ us . ibm . com, j ce rrut i ar .ibm . com @ This paper proposes a fully automated approach to interpreting natural language instructions to complete a task in a virtual world based on unsupervised recordings of human-human interactions perform- ing that task in that virtual world. Given unannotated corpora collected from humans following other humans’ instructions, our system automatically segments the corpus into labeled training data for a classification algorithm. Our interpretation algorithm is based on the observation that similar instructions uttered in similar contexts should lead to similar actions being taken in the virtual world. Given a previously unseen instruction, our system outputs actions that can be directly executed in the virtual world, based on what humans did when given similar instructions in the past. 2 Corpora situated in virtual worlds Our environment consists of six virtual worlds designed for the natural language generation shared task known as the GIVE Challenge (Koller et al., 2010), where a pair of partners must collaborate to solve a task in a 3D space (Figure 1). The “instruction follower” (IF) can move around in the virtual world, but has no knowledge of the task. The “instruction giver” (IG) types instructions to the IF in order to guide him to accomplish the task. Each corpus contains the IF’s actions and position recorded every 200 milliseconds, as well as the IG’s instruc- tions with their timestamps. We used two corpora for our experiments. The Cm corpus (Gargett et al., 2010) contains instructions given by multiple people, consisting of 37 games spanning 2163 instructions over 8: 17 hs. The Proce dJienjgus, R ofep thueb 5lic0t hof A Knonruea ,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi1c 8s1–186, Figure 1: A screenshot of a virtual world. The world consists of interconnecting hallways, rooms and objects Cs corpus (Benotti and Denis, 2011), gathered using a single IG, is composed of 63 games and 3417 in- structions, and was recorded in a span of 6:09 hs. It took less than 15 hours to collect the corpora through the web and the subjects reported that the experiment was fun. While the environment is restricted, people describe the same route and the same objects in extremely different ways. Below are some examples of instructions from our corpus all given for the same route shown in Figure 1. 1) out 2) walk down the passage 3) nowgo [sic] to the pink room 4) back to the room with the plant 5) Go through the door on the left 6) go through opening with yellow wall paper People describe routes using landmarks (4) or specific actions (2). They may describe the same object differently (5 vs 6). Instructions also differ in their scope (3 vs 1). Thus, even ignoring spelling and grammatical errors, navigation instructions contain considerable variation which makes interpreting them a challenging problem. 3 Learning from previous interpretations Our algorithm consists of two phases: annotation and interpretation. Annotation is performed only once and consists of automatically associating each IG instruction to an IF reaction. Interpretation is performed every time the system receives an instruc182 tion and consists of predicting an appropriate reaction given reactions observed in the corpus. Our method is based on the assumption that a reaction captures the semantics of the instruction that caused it. Therefore, if two utterances result in the same reaction, they are paraphrases of each other, and similar utterances should generate the same reaction. This approach enables us to predict reactions for previously-unseen instructions. 3.1 Annotation phase The key challenge in learning from massive amounts of easily-collected data is to automatically annotate an unannotated corpus. Our annotation method consists of two parts: first, segmenting a low-level interaction trace into utterances and corresponding reactions, and second, discretizing those reactions into canonical action sequences. Segmentation enables our algorithm to learn from traces of IFs interacting directly with a virtual world. Since the IF can move freely in the virtual world, his actions are a stream of continuous behavior. Segmentation divides these traces into reactions that follow from each utterance of the IG. Consider the following example starting at the situation shown in Figure 1: IG(1): go through the yellow opening IF(2): [walks out of the room] IF(3): [turns left at the intersection] IF(4): [enters the room with the sofa] IG(5): stop It is not clear whether the IF is doing h3, 4i because h neo tis c reacting htoe r1 t or Fbec isadu soei hge h 3is, being proactive. While one could manually annotate this data to remove extraneous actions, our goal is to develop automated solutions that enable learning from massive amounts of data. We decided to approach this problem by experimenting with two alternative formal definitions: 1) a strict definition that considers the maximum reaction according to the IF behavior, and 2) a loose defini- tion based on the empirical observation that, in situated interaction, most instructions are constrained by the current visually perceived affordances (Gibson, 1979; Stoia et al., 2006). We formally define behavior segmentation (Bhv) as follows. A reaction rk to an instruction uk begins right after the instruction uk is uttered and ends right before the next instruction uk+1 is uttered. In the example, instruction 1corresponds to h2, 3, 4i . We formally d inefsitnrue visibility segmentation (Vis) as f Wolelows. A reaction rk to an instruction uk begins right after the instruction uk is uttered and ends right before the next instruction uk+1 is uttered or right after the IF leaves the area visible at 360◦ from where uk was uttered. In the example, instruction 1’s reaction would be limited to h2i because the intersection is nwootu vldisi bbele l ifmroimte dw htoer he2 tihe b eicnasutrsuec ttihoen was suetctetiroend. The Bhv and Vis methods define how to segment an interaction trace into utterances and their corresponding reactions. However, users frequently perform noisy behavior that is irrelevant to the goal of the task. For example, after hearing an instruction, an IF might go into the wrong room, realize the error, and leave the room. A reaction should not in- clude such irrelevant actions. In addition, IFs may accomplish the same goal using different behaviors: two different IFs may interpret “go to the pink room” by following different paths to the same destination. We would like to be able to generalize both reactions into one canonical reaction. As a result, our approach discretizes reactions into higher-level action sequences with less noise and less variation. Our discretization algorithm uses an automated planner and a planning representation of the task. This planning representation includes: (1) the task goal, (2) the actions which can be taken in the virtual world, and (3) the current state of the virtual world. Using the planning representation, the planner calculates an optimal path between the starting and ending states of the reaction, eliminating all unnecessary actions. While we use the classical planner FF (Hoffmann, 2003), our technique could also work with classical planning (Nau et al., 2004) or other techniques such as probabilistic planning (Bonet and Geffner, 2005). It is also not dependent on a particular discretization of the world in terms of actions. Now we are ready to define canonical reaction ck formally. Let Sk be the state of the virtual world when instruction uk was uttered, Sk+1 be the state of the world where the reaction ends (as defined by Bhv or Vis segmentation), and D be the planning domain representation of the virtual world. The canonical reaction to uk is defined as the sequence of actions 183 returned by the planner with Sk as initial state, Sk+1 as goal state and D as planning domain. 3.2 Interpretation phase The annotation phase results in a collection of (uk, ck) pairs. The interpretation phase uses these pairs to interpret new utterances in three steps. First, we filter the set of pairs into those whose reactions can be directly executed from the current IF position. Second, we group the filtered pairs according to their reactions. Third, we select the group with utterances most similar to the new utterance, and output that group’s reaction. Figure 2 shows the output of the first two steps: three groups of pairs whose reactions can all be executed from the IF’s current position. Figure 2: Utterance groups for this situation. Colored arrows show the reaction associated with each group. We treat the third step, selecting the most similar group for a new utterance, as a classification problem. We compare three different classification methods. One method uses nearest-neighbor classification with three different similarity metrics: Jaccard and Overlap coefficients (both of which measure the degree of overlap between two sets, differing only in the normalization of the final value (Nikravesh et al., 2005)), and Levenshtein Distance (a string met- ric for measuring the amount of differences between two sequences of words (Levenshtein, 1966)). Our second classification method employs a strategy in which we considered each group as a set of possible machine translations of our utterance, using the BLEU measure (Papineni et al., 2002) to select which group could be considered the best translation of our utterance. Finally, we trained an SVM classifier (Cortes and Vapnik, 1995) using the unigrams Corpus Cm Corpus Cs Algorithm Bhv Vis Bhv Vis Jaccard47%54%54%70% Overlap BLEU SVM Levenshtein 43% 44% 33% 21% 53% 52% 29% 20% 45% 54% 45% 8% 60% 50% 29% 17% Table 1: Accuracy comparison between Cm and Cs for Bhv and Vis segmentation of each paraphrase and the position of the IF as features, and setting their group as the output class using a libSVM wrapper (Chang and Lin, 2011). When the system misinterprets an instruction we use a similar approach to what people do in order to overcome misunderstandings. If the system executes an incorrect reaction, the IG can tell the system to cancel its current interpretation and try again using a paraphrase, selecting a different reaction. 4 Evaluation For the evaluation phase, we annotated both the Cm and Cs corpora entirely, and then we split them in an 80/20 proportion; the first 80% of data collected in each virtual world was used for training, while the remaining 20% was used for testing. For each pair (uk, ck) in the testing set, we used our algorithm to predict the reaction to the selected utterance, and then compared this result against the automatically annotated reaction. Table 1 shows the results. Comparing the Bhv and Vis segmentation strategies, Vis tends to obtain better results than Bhv. In addition, accuracy on the Cs corpus was generally higher than Cm. Given that Cs contained only one IG, we believe this led to less variability in the instructions and less noise in the training data. We evaluated the impact of user corrections by simulating them using the existing corpus. In case of a wrong response, the algorithm receives a second utterance with the same reaction (a paraphrase of the previous one). Then the new utterance is tested over the same set of possible groups, except for the one which was returned before. If the correct reaction is not predicted after four tries, or there are no utterances with the same reaction, the predictions are registered as wrong. To measure the effects of user corrections vs. without, we used a different evalu184 ation process for this algorithm: first, we split the corpus in a 50/50 proportion, and then we moved correctly predicted utterances from the testing set towards training, until either there was nothing more to learn or the training set reached 80% of the entire corpus size. As expected, user corrections significantly improve accuracy, as shown in Figure 3. The worst algorithm’s results improve linearly with each try, while the best ones behave asymptotically, barely improving after the second try. The best algorithm reaches 92% with just one correction from the IG. 5 Discussion and future work We presented an approach to instruction interpretation which learns from non-annotated logs of human behavior. Our empirical analysis shows that our best algorithm achieves 70% accuracy on this task, with no manual annotation required. When corrections are added, accuracy goes up to 92% for just one correction. We consider our results promising since state of the art semi-unsupervised approaches to instruction interpretation (Chen and Mooney, 2011) reports a 55% accuracy on manually segmented data. We plan to compare our system’s performance against human performance in comparable situations. Our informal observations of the GIVE corpus indicate that humans often follow instructions incorrectly, so our automated system’s performance may be on par with human performance. Although we have presented our approach in the context of 3D virtual worlds, we believe our technique is also applicable to other domains such as the web, video games, or Human Robot Interaction. Figure 3: Accuracy values with corrections over Cs References Luciana Benotti and Alexandre Denis. 2011. CL system: Giving instructions by corpus based selection. In Proceedings of the Generation Challenges Session at the 13th European Workshop on Natural Language Generation, pages 296–301, Nancy, France, September. Association for Computational Linguistics. Blai Bonet and H ´ector Geffner. 2005. mGPT: a probabilistic planner based on heuristic search. Journal of Artificial Intelligence Research, 24:933–944. S.R.K. Branavan, Harr Chen, Luke Zettlemoyer, and Regina Barzilay. 2009. Reinforcement learning for mapping instructions to actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 82–90, Suntec, Singapore, August. Association for Computational Linguistics. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27: 1– 27:27. Software available at http : / /www . cs ie . ntu .edu .tw/ ˜ c j l in/ l ibsvm. David L. Chen and Raymond J. Mooney. 2011. Learning to interpret natural language navigation instructions from observations. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI2011), pages 859–865, August. Corinna Cortes and Vladimir Vapnik. 1995. Supportvector networks. Machine Learning, 20:273–297. Myroslava O. Dzikovska, James F. Allen, and Mary D. Swift. 2008. Linking semantic and knowledge representations in a multi-domain dialogue system. Journal of Logic and Computation, 18:405–430, June. Andrew Gargett, Konstantina Garoufi, Alexander Koller, and Kristina Striegnitz. 2010. The GIVE-2 corpus of giving instructions in virtual environments. In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC), Malta. James J. Gibson. 1979. The Ecological Approach to Visual Perception, volume 40. Houghton Mifflin. Peter Gorniak and Deb Roy. 2007. Situated language understanding as filtering perceived affordances. Cognitive Science, 3 1(2): 197–231. J o¨rg Hoffmann. 2003. The Metric-FF planning system: Translating ”ignoring delete lists” to numeric state variables. Journal of Artificial Intelligence Research (JAIR), 20:291–341. Alexander Koller, Kristina Striegnitz, Andrew Gargett, Donna Byron, Justine Cassell, Robert Dale, Johanna Moore, and Jon Oberlander. 2010. Report on the second challenge on generating instructions in virtual environments (GIVE-2). In Proceedings of the 6th In185 ternational Natural Language Generation Conference (INLG), Dublin. Tessa Lau, Clemens Drews, and Jeffrey Nichols. 2009. Interpreting written how-to instructions. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, pages 1433–1438, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Technical Report 8. Matt MacMahon, Brian Stankiewicz, and Benjamin Kuipers. 2006. Walk the talk: connecting language, knowledge, and action in route instructions. In Proceedings of the 21st National Conference on Artifi- cial Intelligence - Volume 2, pages 1475–1482. AAAI Press. Cynthia Matuszek, Dieter Fox, and Karl Koscher. 2010. Following directions using statistical machine translation. In Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction, HRI ’ 10, pages 251–258, New York, NY, USA. ACM. Dana Nau, Malik Ghallab, and Paolo Traverso. 2004. Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., California, USA. Masoud Nikravesh, Tomohiro Takagi, Masanori Tajima, Akiyoshi Shinmura, Ryosuke Ohgaya, Koji Taniguchi, Kazuyosi Kawahara, Kouta Fukano, and Akiko Aizawa. 2005. Soft computing for perception-based decision processing and analysis: Web-based BISCDSS. In Masoud Nikravesh, Lotfi Zadeh, and Janusz Kacprzyk, editors, Soft Computing for Information Processing and Analysis, volume 164 of Studies in Fuzziness and Soft Computing, chapter 4, pages 93– 188. Springer Berlin / Heidelberg. Jeff Orkin and Deb Roy. 2009. Automatic learning and generation of social behavior from collective human gameplay. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent SystemsVolume 1, volume 1, pages 385–392. International Foundation for Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 3 11–3 18, Stroudsburg, PA, USA. Association for Computational Linguistics. Verena Rieser and Oliver Lemon. 2010. Learning human multimodal dialogue strategies. Natural Language Engineering, 16:3–23. Laura Stoia, Donna K. Byron, Darla Magdalene Shockley, and Eric Fosler-Lussier. 2006. Sentence planning for realtime navigational instructions. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, NAACLShort ’06, pages 157–160, Stroudsburg, PA, USA. Association for Computational Linguistics. Adam Vogel and Dan Jurafsky. 2010. Learning to follow navigational directions. In Proceedings ofthe 48th Annual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 806–814, Stroudsburg, PA, USA. Association for Computational Linguistics. 186
4 0.45377854 129 acl-2012-Learning High-Level Planning from Text
Author: S.R.K. Branavan ; Nate Kushman ; Tao Lei ; Regina Barzilay
Abstract: Comprehending action preconditions and effects is an essential step in modeling the dynamics of the world. In this paper, we express the semantics of precondition relations extracted from text in terms of planning operations. The challenge of modeling this connection is to ground language at the level of relations. This type of grounding enables us to create high-level plans based on language abstractions. Our model jointly learns to predict precondition relations from text and to perform high-level planning guided by those relations. We implement this idea in the reinforcement learning framework using feedback automatically obtained from plan execution attempts. When applied to a complex virtual world and text describing that world, our relation extraction technique performs on par with a supervised baseline, yielding an F-measure of 66% compared to the baseline’s 65%. Additionally, we show that a high-level planner utilizing these extracted relations significantly outperforms a strong, text unaware baseline successfully completing 80% of planning tasks as compared to 69% for the baseline.1 –
5 0.30106735 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions
Author: Inderjeet Mani ; James Pustejovsky
Abstract: unkown-abstract
6 0.29205588 182 acl-2012-Spice it up? Mining Refinements to Online Instructions from User Generated Content
7 0.285018 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs
8 0.26426587 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes
9 0.241715 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors
10 0.23560551 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
11 0.23465729 77 acl-2012-Ecological Evaluation of Persuasive Messages Using Google AdWords
12 0.22450843 26 acl-2012-Applications of GPC Rules and Character Structures in Games for Learning Chinese Characters
13 0.2085342 56 acl-2012-Computational Approaches to Sentence Completion
14 0.20615129 70 acl-2012-Demonstration of IlluMe: Creating Ambient According to Instant Message Logs
15 0.20324269 94 acl-2012-Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection
16 0.20181915 170 acl-2012-Robust Conversion of CCG Derivations to Phrase Structure Trees
17 0.20131437 215 acl-2012-WizIE: A Best Practices Guided Development Environment for Information Extraction
18 0.20003441 42 acl-2012-Bootstrapping via Graph Propagation
19 0.19745062 196 acl-2012-The OpenGrm open-source finite-state grammar software libraries
20 0.19125018 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
topicId topicWeight
[(25, 0.018), (26, 0.032), (28, 0.037), (30, 0.027), (37, 0.041), (39, 0.047), (57, 0.012), (59, 0.062), (74, 0.037), (82, 0.025), (84, 0.379), (85, 0.023), (90, 0.089), (92, 0.041), (94, 0.02), (99, 0.047)]
simIndex simValue paperId paperTitle
1 0.92893642 122 acl-2012-Joint Evaluation of Morphological Segmentation and Syntactic Parsing
Author: Reut Tsarfaty ; Joakim Nivre ; Evelina Andersson
Abstract: We present novel metrics for parse evaluation in joint segmentation and parsing scenarios where the gold sequence of terminals is not known in advance. The protocol uses distance-based metrics defined for the space of trees over lattices. Our metrics allow us to precisely quantify the performance gap between non-realistic parsing scenarios (assuming gold segmented and tagged input) and realistic ones (not assuming gold segmentation and tags). Our evaluation of segmentation and parsing for Modern Hebrew sheds new light on the performance ofthe best parsing systems to date in the different scenarios.
2 0.90407121 68 acl-2012-Decoding Running Key Ciphers
Author: Sravana Reddy ; Kevin Knight
Abstract: There has been recent interest in the problem of decoding letter substitution ciphers using techniques inspired by natural language processing. We consider a different type of classical encoding scheme known as the running key cipher, and propose a search solution using Gibbs sampling with a word language model. We evaluate our method on synthetic ciphertexts of different lengths, and find that it outperforms previous work that employs Viterbi decoding with character-based models.
3 0.89425045 195 acl-2012-The Creation of a Corpus of English Metalanguage
Author: Shomir Wilson
Abstract: Metalanguage is an essential linguistic mechanism which allows us to communicate explicit information about language itself. However, it has been underexamined in research in language technologies, to the detriment of the performance of systems that could exploit it. This paper describes the creation of the first tagged and delineated corpus of English metalanguage, accompanied by an explicit definition and a rubric for identifying the phenomenon in text. This resource will provide a basis for further studies of metalanguage and enable its utilization in language technologies.
4 0.88210952 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text
Author: Preethi Raghavan ; Albert Lai ; Eric Fosler-Lussier
Abstract: We investigate the problem of ordering medical events in unstructured clinical narratives by learning to rank them based on their time of occurrence. We represent each medical event as a time duration, with a corresponding start and stop, and learn to rank the starts/stops based on their proximity to the admission date. Such a representation allows us to learn all of Allen’s temporal relations between medical events. Interestingly, we observe that this methodology performs better than a classification-based approach for this domain, but worse on the relationships found in the Timebank corpus. This finding has important implications for styles of data representation and resources used for temporal relation learning: clinical narratives may have different language attributes corresponding to temporal ordering relative to Timebank, implying that the field may need to look at a wider range ofdomains to fully understand the nature of temporal ordering.
same-paper 5 0.84113789 93 acl-2012-Fast Online Lexicon Learning for Grounded Language Acquisition
Author: David Chen
Abstract: Learning a semantic lexicon is often an important first step in building a system that learns to interpret the meaning of natural language. It is especially important in language grounding where the training data usually consist of language paired with an ambiguous perceptual context. Recent work by Chen and Mooney (201 1) introduced a lexicon learning method that deals with ambiguous relational data by taking intersections of graphs. While the algorithm produced good lexicons for the task of learning to interpret navigation instructions, it only works in batch settings and does not scale well to large datasets. In this paper we introduce a new online algorithm that is an order of magnitude faster and surpasses the stateof-the-art results. We show that by changing the grammar of the formal meaning represen- . tation language and training on additional data collected from Amazon’s Mechanical Turk we can further improve the results. We also include experimental results on a Chinese translation of the training data to demonstrate the generality of our approach.
6 0.5071314 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
7 0.50268602 59 acl-2012-Corpus-based Interpretation of Instructions in Virtual Environments
8 0.47444913 211 acl-2012-Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation
9 0.47037148 104 acl-2012-Graph-based Semi-Supervised Learning Algorithms for NLP
10 0.46820635 34 acl-2012-Automatically Learning Measures of Child Language Development
12 0.46582246 210 acl-2012-Unsupervized Word Segmentation: the Case for Mandarin Chinese
13 0.46379802 139 acl-2012-MIX Is Not a Tree-Adjoining Language
14 0.46239769 194 acl-2012-Text Segmentation by Language Using Minimum Description Length
15 0.45958251 99 acl-2012-Finding Salient Dates for Building Thematic Timelines
16 0.45818669 88 acl-2012-Exploiting Social Information in Grounded Language Learning via Grammatical Reduction
17 0.45516694 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
18 0.4474808 8 acl-2012-A Corpus of Textual Revisions in Second Language Writing
19 0.44733334 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
20 0.44283175 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction