acl acl2010 acl2010-202 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: S.R.K. Branavan ; Luke Zettlemoyer ; Regina Barzilay
Abstract: In this paper, we address the task of mapping high-level instructions to sequences of commands in an external environment. Processing these instructions is challenging—they posit goals to be achieved without specifying the steps required to complete them. We describe a method that fills in missing information using an automatically derived environment model that encodes states, transitions, and commands that cause these transitions to happen. We present an efficient approximate approach for learning this environment model as part of a policygradient reinforcement learning algorithm for text interpretation. This design enables learning for mapping high-level instructions, which previous statistical methods cannot handle.1
Reference: text
sentIndex sentText sentNum sentScore
1 Zettlemoyer, Regina Barzilay Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology {branavan , lz s , Abstract In this paper, we address the task of mapping high-level instructions to sequences of commands in an external environment. [sent-5, score-0.866]
2 Processing these instructions is challenging—they posit goals to be achieved without specifying the steps required to complete them. [sent-6, score-0.554]
3 We describe a method that fills in missing information using an automatically derived environment model that encodes states, transitions, and commands that cause these transitions to happen. [sent-7, score-0.831]
4 We present an efficient approximate approach for learning this environment model as part of a policygradient reinforcement learning algorithm for text interpretation. [sent-8, score-0.727]
5 1 1 Introduction In this paper, we introduce a novel method for mapping high-level instructions to commands in an external environment. [sent-10, score-0.818]
6 These instructions specify goals to be achieved without explicitly stating all the required steps. [sent-11, score-0.528]
7 This dependence on domain knowledge makes the automatic interpretation of high-level instructions particularly challenging. [sent-14, score-0.548]
8 The standard approach to this task is to start with both a manually-developed model of the environment, and rules for interpreting high-level instructions in the context of this model (Agre and — 1Code, data, and annotations used in this work are available at http://groups. [sent-15, score-0.573]
9 Our approach, in contrast, operates directly on the textual instructions in the context of the interactive environment, while requiring no additional information. [sent-23, score-0.494]
10 By interacting with the environment and observing the resulting feedback, our method automatically learns both the mapping between the text and the commands, and the underlying model of the environment. [sent-24, score-0.63]
11 One particularly noteworthy aspect of our solution is the interplay between the evolving mapping and the progressively acquired environment model as the system learns how to interpret the text. [sent-25, score-0.664]
12 At the same time, the environment model enables the algorithm to consider the consequences ofcommands before they are ex- ecuted, thereby improving the accuracy of interpretation. [sent-27, score-0.544]
13 We apply our method to the task of mapping software troubleshooting guides to GUI actions in the Windows environment (Branavan et al. [sent-29, score-0.716]
14 Second, we demonstrate that explicitly modeling the environment also greatly improves the accuracy of processing low-level instructions, yielding a 14% absolute increase in performance over a competitive baseline (Branavan et al. [sent-35, score-0.498]
15 Finally, we show the importance of constructing an environment model relevant to the language interpretation task using textual — 1268 ProceedingUsp opfs thaela 4, 8Stwhe Adnennu,a 1l1- M16ee Jtiunlgy o 2f0 t1h0e. [sent-37, score-0.632]
16 The mapping process involves segmenting the document into individual instruction word spans Wa, and translating each instruction into the sequence c of one or more commands it describes. [sent-40, score-0.864]
17 instructions enables us to bias exploration toward transitions relevant for language learning. [sent-42, score-0.636]
18 This approach yields superior performance compared to a policy that relies on an environment model constructed via random exploration. [sent-43, score-0.684]
19 2 Related Work Interpreting Instructions Our approach is most closely related to the reinforcement learning algorithm for mapping text instructions to commands developed by Branavan et al. [sent-44, score-1.019]
20 Their method is predicated on the assumption that each command to be executed is explicitly specified in the instruction text. [sent-46, score-0.437]
21 This assumption of a direct correspondence between the text and the environment is not unique to that pa- per, being inherent in other work on grounded language learning (Siskind, 2001 ; Oates, 2001 ; Yu and Ballard, 2004; Fleischman and Roy, 2005; Mooney, 2008; Liang et al. [sent-47, score-0.498]
22 (2009), which learns how an environment operates by reading text, rather than learning an explicit mapping from the text to the environment. [sent-51, score-0.654]
23 For example, their method can learn the rules of a card game given instructions for how to play. [sent-52, score-0.471]
24 Many instances of work on instruction interpretation are replete with examples where instructions are formulated as high-level goals, targeted at users with relevant knowledge (Winograd, 1972; Di Eugenio, 1992; Webber et al. [sent-53, score-0.768]
25 Not surprisingly, automatic approaches for processing such instructions have relied on hand-engineered world knowledge to reason about the preconditions and effects of environment commands. [sent-56, score-0.967]
26 The assumption of a fully specified environment model is also common in work on semantics in the linguistics lit- erature (Lascarides approach learns to directed manner, it fication of relevant and Asher, 2004). [sent-57, score-0.572]
27 While our analyze instructions in a goaldoes not require manual specienvironment knowledge. [sent-58, score-0.471]
28 The first approach, model-based learning, constructs a model of the environment in which the learner operates (e. [sent-60, score-0.52]
29 It then computes a policy directly from the rich information represented in the induced environment model. [sent-63, score-0.656]
30 However, if the environment cannot be accurately approximated by a compact representation, these methods perform poorly (Boyan and Moore, 1995; Jong and Stone, 2007). [sent-66, score-0.492]
31 cLwolEiFmcrTkd_minCsLagpInCasdKnt(ar ts: art) Policy function State Observed text and environment IctSnyelicpltkeheice"ntdgorcpuosentmanarcfbtn. [sent-72, score-0.514]
32 State s is comprised of the state of the external environment E, and the state of the document (d, W), where W is the list of all twheord st spans mapped by previous aecnttio En,s . [sent-75, score-0.771]
33 a nAdn t haect siotante a sfe tlheect dso a span Wa ,oWf u)n, uwsheder we Words is f trhoem l (d, W), and maps them to an environment command c. [sent-76, score-0.733]
34 As a consequence of a, the environment state changes to E0 ∼ p(E0 |E, c), and the list of mapped words is updated to W0 = W ∪ Wa. [sent-77, score-0.56]
35 While pol- icy learners can effectively operate in complex environments, they are not designed to benefit from a learned environment model. [sent-79, score-0.469]
36 We address this limitation by expanding a policy learning algorithm to take advantage of a partial environment model estimated during learning. [sent-80, score-0.788]
37 The approach of conditioning the policy function on future reachable states is similar in concept to the use of postdecision state information in the approximate dynamic programming framework (Powell, 2007). [sent-81, score-0.444]
38 3 Problem Formulation Our goal is to map instructions expressed in a natural language document d into the corresponding sequence of commands = hc1, . [sent-82, score-0.843]
39 A =s input, we are given a set of raw instruction documents, an environment, and a reward function as described below. [sent-86, score-0.394]
40 The environment is formalized as its states and transition function. [sent-87, score-0.62]
41 The environment state transition function c p(E0|E, c) encodes how the state changes from E tpo( EE0| Ein, response tso a wcom thema sntadt ec c. [sent-90, score-0.773]
42 3h During loemarn Eing, this function is not known, but samples from it can be collected by executing commands and ob3While in the general case the environment state transitions maybe stochastic, they are deterministic in the software GUI used in this work. [sent-91, score-1.051]
43 A realvalued reward function measures how well a command sequence c achieves the task described in the document. [sent-93, score-0.454]
44 4 Background Our innovation established mapping takes place within a previously general framework instructions et al. [sent-102, score-0.557]
45 for the task of to commands This framework (Branavan formalizes the mapping process as a Markov Decision Process (MDP) (Sutton and Barto, 1998), with actions encoding individual instruction-to-command mappings, and states representing tions of the document. [sent-104, score-0.606]
46 1270 Figure 3: Using information derived from future states to interpret the high-level instruction “open control panel. [sent-108, score-0.386]
47 Environment states are tsrhoolw pna as circles, with previously visited environment states colored green. [sent-110, score-0.665]
48 All else being equal, the information that the control panel icon was observed in state E5 during previous exploration steps can help to correctly select command c3. [sent-112, score-0.554]
49 Each action selects a word span from the document, and maps it to one environment command. [sent-114, score-0.636]
50 To predict actions sequentially, we track the states of the environment and the document over time as shown in Figure 2. [sent-115, score-0.803]
51 The mapping action a is a tuple (c, Wa) that represents the joint selection of a span of words Wa and an environment command c. [sent-118, score-0.915]
52 Some of the candidate actions would correspond to the correct instruction mappings, e. [sent-119, score-0.353]
53 The algorithm learns to interpret instructions by learning to construct sequences of actions that assign the correct com- mands to the words. [sent-123, score-0.814]
54 The interpretation of a document d begins at an initial mapping state s0 = (Ed, d, ∅), Ed being the starting state of the enviro=nm (Een,td f,o∅r) t,h Ee document. [sent-124, score-0.42]
55 Given a state s = (E, d, W), the space of possibGleiv eanct aion ssta a = (c, Wa) iWs d),e ftihneed s by ee nofum poesrsait-ing sub-spans of unused words in d and candidate commands in E. [sent-125, score-0.375]
56 6 A Log-Linear Parameterization The policy function used for action selection is defined as a log-linear distribution over actions: p(a|s;θ) =Xeθe·φθ(·φs(,as,)a0), (1) Xa0 where θ ∈ Rn is a weight vector, and φ(s, a) ∈ Rn iws an enθ-d ∈im Rensiisoanawle fieghattuvree ftounr,c atinodn. [sent-132, score-0.374]
57 The main challenge in processing these instructions is that, in contrast to their low-level counterparts, they correspond to sequences of one or more commands. [sent-140, score-0.519]
58 However, this change significantly complicates the interpretation problem we need to be able to predict commands that are not directly described by any words, and allowing ac– tion sequences significantly increases the space of possibilities for each instruction. [sent-146, score-0.386]
59 To motivate the approach, consider the decision problem in Figure 3, where we need to find a command sequence for the high-level instruction “open control panel. [sent-148, score-0.506]
60 ” The algorithm focuses on command sequences leading to environment states where the control panel icon was previously observed. [sent-149, score-1.033]
61 The information about such states is acquired during exploration and is stored in a partial environment model q(E0 |E, c) . [sent-150, score-0.687]
62 Our goal is to map high-level instructions to command sequences by leveraging knowledge about the long-term effects of commands. [sent-151, score-0.762]
63 We do this by integrating the partial environment model into the policy function. [sent-152, score-0.735]
64 Below, we first describe how we estimate the partial environment transition model and how this model is used to compute the look-ahead features. [sent-156, score-0.652]
65 1 Partial Environment Transition Model To compute the look-ahead features, we first need to collect statistics about the environment transition function p(E0 |E, c). [sent-159, score-0.567]
66 We collect this information through observation, and build a partial environment transition model q(E0|E, c). [sent-162, score-0.601]
67 One possible strategy for constructing q is to observe the effects of executing random commands in the environment. [sent-163, score-0.399]
68 During training, we execute the command sequences predicted by the policy function in the environment, caching the resulting state transitions. [sent-166, score-0.621]
69 As learning progresses and the quality of the interpretation improves, more promising parts of the environment will be observed. [sent-168, score-0.575]
70 Instead, we capitalize on the state transitions observed during the sampling process described above, allowing us to incrementally build an environment model of actions and their effects. [sent-173, score-0.845]
71 Based on this transition information, we can estimate the usefulness of actions by considering the properties of states they can reach. [sent-174, score-0.361]
72 cTtihoins property is computed using the learned environment model, and is therefore an approximation. [sent-178, score-0.469]
73 Because we can never encounter all states and all actions, our environment model is always incomplete and these properties can only be computed based on partial information. [sent-182, score-0.672]
74 In particular, we select actions a based on the current state s and the partial environment model q, resulting in the following policy definition: p(a|s;q,θ) =Xeθe·θφ·(φs(,as,,qa)0,q), (3) Xa0 where the feature representation φ(s, a, q) has been extended to be a function of q. [sent-185, score-1.032]
75 3 Parameter Estimation The learning algorithm is provided with a set of documents d ∈ D, an environment in which to ex- cn, edcouctuem ceonmtsm da n∈d D sequences manendt a r ewwhaicrdh tfou enxc-tion r(h). [sent-187, score-0.629]
76 The goal is to estimate two sets of parameters: 1) the parameters θ of the policy function, and 2) the partial environment transition model q(E0|E, c), which is the observed portion of mtheo terul eq Emo|Ede,lc p(E0 |E, c). [sent-188, score-0.834]
77 An improved policy function in turn produces state samples that are more relevant to the document interpretation task. [sent-204, score-0.534]
78 Environment States and Actions In this appli- cation of our model, the environment state is the set of visible user interface (UI) objects, along 7http://support. [sent-210, score-0.56]
79 The environment commands consist of the UI commands left-click, right-click, double-click, and type-into. [sent-215, score-0.991]
80 Since such verification is a challenging task, we rely on a noisy approximation: we assume that each sentence specifies at least one command, and that the text describing the command has words matching the label of the environment object. [sent-220, score-0.71]
81 If a history h has at least one such command for each sen- tence, the environment reward function r(h) returns a positive value, otherwise it returns a negative value. [sent-221, score-0.91]
82 This environment reward function is a simplification of the one described in Branavan et al. [sent-222, score-0.671]
83 These features are functions of both the text and environment state, modeling local properties that are useful for action selection. [sent-227, score-0.614]
84 states with the lowest possible immediate reward, and use the induced environment model to encourage additional exploration by lowering the likelihood of actions that lead to such dead-end states. [sent-241, score-0.797]
85 During the early stages of learning, experience gathered in the environment model is extremely sparse, causing the look-ahead features to provide poor estimates. [sent-242, score-0.497]
86 For example, executing an incorrect action early on, often leads to an environment state from which the remaining instructions cannot be completed. [sent-255, score-1.231]
87 This discrepancy is explained by the fact that in this dataset, high-level instructions are often located towards the beginning of the document. [sent-270, score-0.471]
88 If these initial challenging instructions are not processed correctly, the rest of the actions for the document cannot be interpreted. [sent-271, score-0.732]
89 We also performed experiments to validate the intuition that the partial environment model must contain information relevant for the language interpretation task. [sent-275, score-0.653]
90 To test this hypothesis, we replaced the learned environment model with one of the same size gathered by executing random commands. [sent-276, score-0.578]
91 The model with randomly sampled environment transitions performs poorly: it can only process 4. [sent-277, score-0.57]
92 This result also explains why training with full supervision hurts performance on highlevel instructions (see Table 1). [sent-281, score-0.497]
93 Finally, to demonstrate the quality of the learned word–command alignments, we evaluate our method’s ability to paraphrase from high-level instructions to low-level instructions. [sent-285, score-0.471]
94 We did this by finding high-level instructions where each of the commands they are associated with is also described by a low-level instruction in some other document. [sent-287, score-0.924]
95 For example, if the text “open control panel” was mapped to the three commands in Figure 1, and each of those commands was described by a low-level instruction elsewhere, this procedure would create a paraphrase such as “click start, left click setting, and select control panel. [sent-288, score-0.908]
96 ” Of the 60 highlevel instructions tagged in the test set, this approach found paraphrases for 33 of them. [sent-289, score-0.524]
97 9 Conclusions and Future Work In this paper, we demonstrate that knowledge about the environment can be learned and used effectively for the task ofmapping instructions to actions. [sent-292, score-0.94]
98 A key feature of this approach is the synergy between language analysis and the construction of the environment model: instruction text drives the sampling of the environment transitions, while the acquired environment model facilitates language interpretation. [sent-293, score-1.651]
99 This design enables us to learn to map high-level instructions while also improving accuracy on low-level instructions. [sent-294, score-0.494]
100 An interest– ing avenue of future work is to explore an alternative approach which learns these phenomena by combining linguistic information with knowledge gleaned from an automatically induced environment model. [sent-297, score-0.516]
wordName wordTfidf (topN-words)
[('instructions', 0.471), ('environment', 0.469), ('commands', 0.261), ('command', 0.216), ('branavan', 0.192), ('instruction', 0.192), ('policy', 0.187), ('actions', 0.161), ('reward', 0.157), ('reinforcement', 0.148), ('action', 0.119), ('states', 0.098), ('state', 0.091), ('wa', 0.088), ('mapping', 0.086), ('panel', 0.085), ('executing', 0.081), ('interpretation', 0.077), ('document', 0.075), ('transitions', 0.073), ('gui', 0.072), ('click', 0.07), ('eugenio', 0.064), ('sutton', 0.063), ('control', 0.062), ('documents', 0.059), ('kushman', 0.054), ('transition', 0.053), ('partial', 0.051), ('sequences', 0.048), ('learns', 0.047), ('interpreting', 0.046), ('double', 0.045), ('function', 0.045), ('gradient', 0.043), ('singh', 0.043), ('exploration', 0.041), ('di', 0.039), ('barto', 0.038), ('grounding', 0.037), ('sequence', 0.036), ('agre', 0.036), ('boyan', 0.036), ('darken', 0.036), ('satinder', 0.036), ('webber', 0.035), ('interpret', 0.034), ('regina', 0.034), ('execute', 0.034), ('windows', 0.034), ('luke', 0.031), ('macmahon', 0.031), ('matuszek', 0.031), ('icon', 0.031), ('jong', 0.031), ('samples', 0.031), ('barbara', 0.031), ('dataset', 0.031), ('ui', 0.03), ('constructing', 0.03), ('explicitly', 0.029), ('learning', 0.029), ('zettlemoyer', 0.029), ('model', 0.028), ('goals', 0.028), ('relevant', 0.028), ('steps', 0.028), ('paraphrases', 0.027), ('posit', 0.027), ('effects', 0.027), ('properties', 0.026), ('lascarides', 0.026), ('eisenstein', 0.026), ('highlevel', 0.026), ('fleischman', 0.026), ('schatzmann', 0.026), ('span', 0.025), ('challenging', 0.025), ('nips', 0.024), ('synergy', 0.024), ('tso', 0.024), ('tab', 0.024), ('advanced', 0.024), ('microsoft', 0.024), ('algorithm', 0.024), ('maps', 0.023), ('reachable', 0.023), ('estimate', 0.023), ('st', 0.023), ('parameters', 0.023), ('accurately', 0.023), ('incrementally', 0.023), ('operates', 0.023), ('enables', 0.023), ('datasets', 0.023), ('stochastic', 0.023), ('history', 0.023), ('ee', 0.023), ('iws', 0.023), ('spans', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 202 acl-2010-Reading between the Lines: Learning to Map High-Level Instructions to Commands
Author: S.R.K. Branavan ; Luke Zettlemoyer ; Regina Barzilay
Abstract: In this paper, we address the task of mapping high-level instructions to sequences of commands in an external environment. Processing these instructions is challenging—they posit goals to be achieved without specifying the steps required to complete them. We describe a method that fills in missing information using an automatically derived environment model that encodes states, transitions, and commands that cause these transitions to happen. We present an efficient approximate approach for learning this environment model as part of a policygradient reinforcement learning algorithm for text interpretation. This design enables learning for mapping high-level instructions, which previous statistical methods cannot handle.1
2 0.2985622 168 acl-2010-Learning to Follow Navigational Directions
Author: Adam Vogel ; Dan Jurafsky
Abstract: We present a system that learns to follow navigational natural language directions. Where traditional models learn from linguistic annotation or word distributions, our approach is grounded in the world, learning by apprenticeship from routes through a map paired with English descriptions. Lacking an explicit alignment between the text and the reference path makes it difficult to determine what portions of the language describe which aspects of the route. We learn this correspondence with a reinforcement learning algorithm, using the deviation of the route we follow from the intended path as a reward signal. We demonstrate that our system successfully grounds the meaning of spatial terms like above and south into geometric properties of paths.
3 0.16772674 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
Author: Srinivasan Janarthanam ; Oliver Lemon
Abstract: We present a data-driven approach to learn user-adaptive referring expression generation (REG) policies for spoken dialogue systems. Referring expressions can be difficult to understand in technical domains where users may not know the technical ‘jargon’ names of the domain entities. In such cases, dialogue systems must be able to model the user’s (lexical) domain knowledge and use appropriate referring expressions. We present a reinforcement learning (RL) framework in which the sys- tem learns REG policies which can adapt to unknown users online. Furthermore, unlike supervised learning methods which require a large corpus of expert adaptive behaviour to train on, we show that effective adaptive policies can be learned from a small dialogue corpus of non-adaptive human-machine interaction, by using a RL framework and a statistical user simulation. We show that in comparison to adaptive hand-coded baseline policies, the learned policy performs significantly better, with an 18.6% average increase in adaptation accuracy. The best learned policy also takes less dialogue time (average 1.07 min less) than the best hand-coded policy. This is because the learned policies can adapt online to changing evidence about the user’s domain expertise.
4 0.15817258 239 acl-2010-Towards Relational POMDPs for Adaptive Dialogue Management
Author: Pierre Lison
Abstract: Open-ended spoken interactions are typically characterised by both structural complexity and high levels of uncertainty, making dialogue management in such settings a particularly challenging problem. Traditional approaches have focused on providing theoretical accounts for either the uncertainty or the complexity of spoken dialogue, but rarely considered the two issues simultaneously. This paper describes ongoing work on a new approach to dialogue management which attempts to fill this gap. We represent the interaction as a Partially Observable Markov Decision Process (POMDP) over a rich state space incorporating both dialogue, user, and environment models. The tractability of the resulting POMDP can be preserved using a mechanism for dynamically constraining the action space based on prior knowledge over locally relevant dialogue structures. These constraints are encoded in a small set of general rules expressed as a Markov Logic network. The first-order expressivity of Markov Logic enables us to leverage the rich relational structure of the problem and efficiently abstract over large regions ofthe state and action spaces.
5 0.15380006 35 acl-2010-Automated Planning for Situated Natural Language Generation
Author: Konstantina Garoufi ; Alexander Koller
Abstract: We present a natural language generation approach which models, exploits, and manipulates the non-linguistic context in situated communication, using techniques from AI planning. We show how to generate instructions which deliberately guide the hearer to a location that is convenient for the generation of simple referring expressions, and how to generate referring expressions with context-dependent adjectives. We implement and evaluate our approach in the framework of the Challenge on Generating Instructions in Virtual Environments, finding that it performs well even under the constraints of realtime generation.
6 0.12113605 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems
7 0.087643482 190 acl-2010-P10-5005 k2opt.pdf
8 0.084898941 142 acl-2010-Importance-Driven Turn-Bidding for Spoken Dialogue Systems
9 0.069148652 13 acl-2010-A Rational Model of Eye Movement Control in Reading
10 0.068731621 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
11 0.067086101 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
12 0.054654229 227 acl-2010-The Impact of Interpretation Problems on Tutorial Dialogue
13 0.053252976 47 acl-2010-Beetle II: A System for Tutoring and Computational Linguistics Experimentation
14 0.0532046 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction
15 0.053133894 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future
16 0.052763514 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
17 0.050687809 31 acl-2010-Annotation
18 0.048503898 149 acl-2010-Incorporating Extra-Linguistic Information into Reference Resolution in Collaborative Task Dialogue
19 0.047745299 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data
20 0.046282243 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization
topicId topicWeight
[(0, -0.156), (1, 0.058), (2, -0.059), (3, -0.147), (4, -0.029), (5, -0.169), (6, -0.115), (7, 0.038), (8, 0.005), (9, 0.02), (10, -0.03), (11, -0.041), (12, 0.044), (13, 0.008), (14, -0.022), (15, -0.117), (16, 0.083), (17, 0.113), (18, -0.05), (19, 0.026), (20, 0.061), (21, -0.069), (22, -0.112), (23, 0.074), (24, 0.039), (25, -0.057), (26, -0.021), (27, 0.074), (28, -0.137), (29, -0.059), (30, 0.214), (31, -0.348), (32, 0.089), (33, 0.047), (34, -0.04), (35, 0.132), (36, -0.041), (37, 0.206), (38, -0.107), (39, -0.007), (40, 0.083), (41, -0.037), (42, 0.044), (43, 0.069), (44, -0.038), (45, 0.028), (46, 0.091), (47, -0.104), (48, 0.077), (49, 0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.96013635 202 acl-2010-Reading between the Lines: Learning to Map High-Level Instructions to Commands
Author: S.R.K. Branavan ; Luke Zettlemoyer ; Regina Barzilay
Abstract: In this paper, we address the task of mapping high-level instructions to sequences of commands in an external environment. Processing these instructions is challenging—they posit goals to be achieved without specifying the steps required to complete them. We describe a method that fills in missing information using an automatically derived environment model that encodes states, transitions, and commands that cause these transitions to happen. We present an efficient approximate approach for learning this environment model as part of a policygradient reinforcement learning algorithm for text interpretation. This design enables learning for mapping high-level instructions, which previous statistical methods cannot handle.1
2 0.94371849 168 acl-2010-Learning to Follow Navigational Directions
Author: Adam Vogel ; Dan Jurafsky
Abstract: We present a system that learns to follow navigational natural language directions. Where traditional models learn from linguistic annotation or word distributions, our approach is grounded in the world, learning by apprenticeship from routes through a map paired with English descriptions. Lacking an explicit alignment between the text and the reference path makes it difficult to determine what portions of the language describe which aspects of the route. We learn this correspondence with a reinforcement learning algorithm, using the deviation of the route we follow from the intended path as a reward signal. We demonstrate that our system successfully grounds the meaning of spatial terms like above and south into geometric properties of paths.
3 0.78053051 35 acl-2010-Automated Planning for Situated Natural Language Generation
Author: Konstantina Garoufi ; Alexander Koller
Abstract: We present a natural language generation approach which models, exploits, and manipulates the non-linguistic context in situated communication, using techniques from AI planning. We show how to generate instructions which deliberately guide the hearer to a location that is convenient for the generation of simple referring expressions, and how to generate referring expressions with context-dependent adjectives. We implement and evaluate our approach in the framework of the Challenge on Generating Instructions in Virtual Environments, finding that it performs well even under the constraints of realtime generation.
4 0.58774054 190 acl-2010-P10-5005 k2opt.pdf
Author: empty-author
Abstract: unkown-abstract
5 0.50163776 239 acl-2010-Towards Relational POMDPs for Adaptive Dialogue Management
Author: Pierre Lison
Abstract: Open-ended spoken interactions are typically characterised by both structural complexity and high levels of uncertainty, making dialogue management in such settings a particularly challenging problem. Traditional approaches have focused on providing theoretical accounts for either the uncertainty or the complexity of spoken dialogue, but rarely considered the two issues simultaneously. This paper describes ongoing work on a new approach to dialogue management which attempts to fill this gap. We represent the interaction as a Partially Observable Markov Decision Process (POMDP) over a rich state space incorporating both dialogue, user, and environment models. The tractability of the resulting POMDP can be preserved using a mechanism for dynamically constraining the action space based on prior knowledge over locally relevant dialogue structures. These constraints are encoded in a small set of general rules expressed as a Markov Logic network. The first-order expressivity of Markov Logic enables us to leverage the rich relational structure of the problem and efficiently abstract over large regions ofthe state and action spaces.
6 0.3808094 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems
7 0.37977314 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
8 0.37101358 142 acl-2010-Importance-Driven Turn-Bidding for Spoken Dialogue Systems
9 0.34751201 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
10 0.32055673 13 acl-2010-A Rational Model of Eye Movement Control in Reading
11 0.2917721 179 acl-2010-Now, Where Was I? Resumption Strategies for an In-Vehicle Dialogue System
12 0.28813437 61 acl-2010-Combining Data and Mathematical Models of Language Change
13 0.28116226 92 acl-2010-Don't 'Have a Clue'? Unsupervised Co-Learning of Downward-Entailing Operators.
14 0.27828336 224 acl-2010-Talking NPCs in a Virtual Game World
15 0.24512252 64 acl-2010-Complexity Assumptions in Ontology Verbalisation
16 0.23647478 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
17 0.22928284 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis
18 0.22914186 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
19 0.22886869 248 acl-2010-Unsupervised Ontology Induction from Text
20 0.22248298 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data
topicId topicWeight
[(14, 0.057), (25, 0.059), (28, 0.011), (39, 0.013), (42, 0.033), (44, 0.019), (59, 0.097), (73, 0.054), (74, 0.237), (78, 0.027), (83, 0.092), (84, 0.057), (98, 0.129)]
simIndex simValue paperId paperTitle
1 0.90858781 222 acl-2010-SystemT: An Algebraic Approach to Declarative Information Extraction
Author: Laura Chiticariu ; Rajasekar Krishnamurthy ; Yunyao Li ; Sriram Raghavan ; Frederick Reiss ; Shivakumar Vaithyanathan
Abstract: As information extraction (IE) becomes more central to enterprise applications, rule-based IE engines have become increasingly important. In this paper, we describe SystemT, a rule-based IE system whose basic design removes the expressivity and performance limitations of current systems based on cascading grammars. SystemT uses a declarative rule language, AQL, and an optimizer that generates high-performance algebraic execution plans for AQL rules. We compare SystemT’s approach against cascading grammars, both theoretically and with a thorough experimental evaluation. Our results show that SystemT can deliver result quality comparable to the state-of-the- art and an order of magnitude higher annotation throughput.
same-paper 2 0.81547415 202 acl-2010-Reading between the Lines: Learning to Map High-Level Instructions to Commands
Author: S.R.K. Branavan ; Luke Zettlemoyer ; Regina Barzilay
Abstract: In this paper, we address the task of mapping high-level instructions to sequences of commands in an external environment. Processing these instructions is challenging—they posit goals to be achieved without specifying the steps required to complete them. We describe a method that fills in missing information using an automatically derived environment model that encodes states, transitions, and commands that cause these transitions to happen. We present an efficient approximate approach for learning this environment model as part of a policygradient reinforcement learning algorithm for text interpretation. This design enables learning for mapping high-level instructions, which previous statistical methods cannot handle.1
3 0.65464127 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD
Author: Weiwei Guo ; Mona Diab
Abstract: Word Sense Disambiguation remains one ofthe most complex problems facing computational linguists to date. In this paper we present a system that combines evidence from a monolingual WSD system together with that from a multilingual WSD system to yield state of the art performance on standard All-Words data sets. The monolingual system is based on a modification ofthe graph based state ofthe art algorithm In-Degree. The multilingual system is an improvement over an AllWords unsupervised approach, SALAAM. SALAAM exploits multilingual evidence as a means of disambiguation. In this paper, we present modifications to both of the original approaches and then their combination. We finally report the highest results obtained to date on the SENSEVAL 2 standard data set using an unsupervised method, we achieve an overall F measure of 64.58 using a voting scheme.
4 0.65208447 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser
Author: Stephen Wu ; Asaf Bachrach ; Carlos Cardenas ; William Schuler
Abstract: Hierarchical HMM (HHMM) parsers make promising cognitive models: while they use a bounded model of working memory and pursue incremental hypotheses in parallel, they still achieve parsing accuracies competitive with chart-based techniques. This paper aims to validate that a right-corner HHMM parser is also able to produce complexity metrics, which quantify a reader’s incremental difficulty in understanding a sentence. Besides defining standard metrics in the HHMM framework, a new metric, embedding difference, is also proposed, which tests the hypothesis that HHMM store elements represents syntactic working memory. Results show that HHMM surprisal outperforms all other evaluated metrics in predicting reading times, and that embedding difference makes a significant, independent contribution.
5 0.64844835 214 acl-2010-Sparsity in Dependency Grammar Induction
Author: Jennifer Gillenwater ; Kuzman Ganchev ; Joao Graca ; Fernando Pereira ; Ben Taskar
Abstract: A strong inductive bias is essential in unsupervised grammar induction. We explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. Specifically, we investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In ex- periments with 12 languages, we achieve substantial gains over the standard expectation maximization (EM) baseline, with average improvement in attachment accuracy of 6.3%. Further, our method outperforms models based on a standard Bayesian sparsity-inducing prior by an average of 4.9%. On English in particular, we show that our approach improves on several other state-of-the-art techniques.
6 0.64691544 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
7 0.6421535 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
8 0.64208913 162 acl-2010-Learning Common Grammar from Multilingual Corpus
9 0.64156455 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
10 0.63988793 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
11 0.63955295 136 acl-2010-How Many Words Is a Picture Worth? Automatic Caption Generation for News Images
12 0.63936239 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
13 0.6384905 195 acl-2010-Phylogenetic Grammar Induction
14 0.6378783 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
15 0.63662481 39 acl-2010-Automatic Generation of Story Highlights
16 0.6352163 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
17 0.63461804 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
18 0.63430321 158 acl-2010-Latent Variable Models of Selectional Preference
19 0.63427347 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery
20 0.63427043 71 acl-2010-Convolution Kernel over Packed Parse Forest