acl acl2010 acl2010-165 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michaela Regneri ; Alexander Koller ; Manfred Pinkal
Abstract: We describe a novel approach to unsupervised learning of the events that make up a script, along with constraints on their temporal ordering. We collect naturallanguage descriptions of script-specific event sequences from volunteers over the Internet. Then we compute a graph representation of the script’s temporal structure using a multiple sequence alignment algorithm. The evaluation of our system shows that we outperform two informed baselines.
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract We describe a novel approach to unsupervised learning of the events that make up a script, along with constraints on their temporal ordering. [sent-3, score-0.406]
2 We collect naturallanguage descriptions of script-specific event sequences from volunteers over the Internet. [sent-4, score-0.637]
3 Then we compute a graph representation of the script’s temporal structure using a multiple sequence alignment algorithm. [sent-5, score-0.489]
4 1 Introduction A script is “a standardized sequence of events that describes some stereotypical human activity such as going to a restaurant or visiting a doctor” (Barr and Feigenbaum, 1981). [sent-7, score-0.778]
5 ”, because the SHOPPING script involves a ‘payment’ event, which again involves the transfer of money. [sent-9, score-0.379]
6 It has long been recognized that text understanding systems would benefit from the implicit information represented by a script (Cullingford, 1977; Mueller, 2004; Miikkulainen, 1995). [sent-10, score-0.414]
7 However, it is also commonly accepted that the large-scale manual formalization of scripts is infeasible. [sent-13, score-0.2]
8 While there have been a few attempts at doing this (Mueller, 1998; Gordon, 2001), efforts in which expert annotators create script knowledge bases clearly don’t scale. [sent-14, score-0.379]
9 , 2008); but while these efforts have achieved impressive results, they are limited by the very fact that a lot of scripts such as SHOPPING are shared implicit knowledge, and their events are therefore rarely elaborated in text. [sent-18, score-0.358]
10 We focus on the temporal event structure of scripts; that is, we aim to learn what phrases can describe the same event in a script, and what constraints must hold on the temporal order in which these events occur. [sent-20, score-1.35]
11 We approach this problem by asking non-experts to describe typical event sequences in a given scenario over the Internet. [sent-21, score-0.526]
12 This allows us to assemble large and varied collections of event sequence descriptions (ESDs), which are focused on a single scenario. [sent-22, score-0.641]
13 We then compute a – – temporal script graph for the scenario by identifying corresponding event descriptions using a Multiple Sequence Alignment algorithm from bioinformatics, and converting the alignment into a graph. [sent-23, score-1.512]
14 This graph makes statements about what phrases can describe the same event of a scenario, and in what order these events can take place. [sent-24, score-0.596]
15 Crucially, our algorithm exploits the sequential structure of the ESDs to distinguish event descriptions that occur at different points in the script storyline, even when they are semantically similar. [sent-25, score-0.97]
16 We evaluate our script graph algorithm on ten unseen scenarios, and show that it significantly outperforms a clustering-based baseline. [sent-26, score-0.469]
17 c As2s0o1c0ia Atisosnoc foiart Cionom fopru Ctaotmiopnuatla Lti on gaulis Lti cnsg,u piasgtiecs 979–98 , we understand scripts, and what aspect of scripts we model here, in Section 3. [sent-31, score-0.2]
18 Section 4 describes our data collection method, and Section 5 explains how we use Multiple Sequence Alignment to com- pute a temporal script graph. [sent-32, score-0.661]
19 For instance, Mooney (1990) describes an early attempt to acquire causal chains, and Smith and Arnold (2009) use a graph-based algorithm to learn temporal script structures. [sent-35, score-0.661]
20 More recently, there have been a number of approaches to automatically learning event chains from corpora (Chambers and Jurafsky, 2008b; Chambers and Jurafsky, 2009; Manshadi et al. [sent-37, score-0.312]
21 These systems typically employ a method for classifying temporal relations between given event descriptions (Chambers et al. [sent-39, score-0.807]
22 They achieve impressive performance at extract- ing high-level descriptions of procedures such as a CRIMINAL PROCESS. [sent-42, score-0.246]
23 Because our approach involves directly asking people for event sequence descriptions, it can focus on acquiring specific scripts from arbitrary domains, and we can control the level of granularity at which scripts are described. [sent-43, score-0.795]
24 Furthermore, we believe that much information about scripts is usually left implicit in texts and is therefore easier to learn from our more explicit data. [sent-44, score-0.235]
25 Finally, our system automatically learns different phrases which describe the same event together with the temporal ordering constraints. [sent-45, score-0.666]
26 Jones and Thompson (2003) describe an approach to identifying different natural language realizations for the same event considering the temporal structure of a scenario. [sent-46, score-0.561]
27 However, they don’t aim to acquire or represent the temporal structure of the whole script in the end. [sent-47, score-0.628]
28 Unlike Barzilay and Lee, we do not tackle the general paraphrase prob- lem, but only consider whether two phrases describe the same event in the context of the same script. [sent-49, score-0.478]
29 Where EATING IN A RESTAURANT is a scenario, the script describes a number of events, such as ordering and leaving, that must occur in a certain order in order to constitute an EATING IN A RESTAURANT activity. [sent-59, score-0.446]
30 The classical perspective on scripts (Schank and Abelson, 1977) has been that next to defining some events with temporal constraints, a script also defines their participants and their causal connections. [sent-60, score-0.981]
31 Here we focus on the narrower task of learning the events that a script consists of, and of modeling and learning the temporal ordering constraints that hold between them. [sent-61, score-0.819]
32 Formally, we will specify a script (in this simplified sense) in terms of a directed graph Gs = (Es, Ts), where Es is a set of nodes representing the events of a scenario s, and Ts is a set of edges (ei, ek) indicating that the event ei typically happens before ek in s. [sent-62, score-1.118]
33 We call Gs the temporal script graph (TSG) for s. [sent-63, score-0.718]
34 Each event in a TSG can usually be expressed with many different natural-language phrases. [sent-64, score-0.312]
35 We call a natural-language realization of an individual event in the script an event description, and we call a sequence of event descriptions that form one particular instance of the script an event sequence description (ESD). [sent-67, score-2.418]
36 Examples of ESDs for the FAST FOOD RESTAURANT script are shown in Fig. [sent-68, score-0.379]
37 Our goal in this paper is to take a set of ESDs for a given scenario as our input and then compute a TSG that clusters different descriptions of the same event into the same node, and contains edges that generalize the temporal information encoded in the ESDs. [sent-71, score-1.013]
38 4 Data Acquisition In order to automatically learn TSGs, we selected 22 scenarios for which we collect ESDs. [sent-72, score-0.149]
39 We deliberately included scenarios of varying complexity, including some that we considered hard to describe (CHILDHOOD, CREATE A HOMEPAGE), scenarios with highly variable orderings between events (MAKING SCRAMBLED EGGS), and scenarios for which we expected cultural differences (WEDDING). [sent-73, score-0.468]
40 For every scenario, we asked 25 people to enter a typical sequence of events in this scenario, in temporal order and in “bullet point style”. [sent-75, score-0.503]
41 Participants were allowed to skip a scenario if they felt unable to enter events for it, but had to indicate why. [sent-79, score-0.34]
42 The most frequent explanation for this was that they didn’t know how a certain scenario works: The scenario with the highest proportion of skipped forms was CREATE A HOMEPAGE, whereas MAKING SCRAMBLED EGGS was the only one in which nobody skipped a form. [sent-85, score-0.338]
43 when users misunderstood the scenario, or did not list the event descriptions in temporal order). [sent-89, score-0.807]
44 As the example illustrates, descriptions differ in their starting points (‘walk into restaurant’ vs. [sent-93, score-0.246]
45 ‘walk to counter’), the granularity of the descriptions (‘pay the bill’ vs. [sent-94, score-0.246]
46 event descriptions 8–1 1 in the third sequence), and the events that are mentioned in the sequence (not even ‘eat food’ is mentioned in all ESDs). [sent-95, score-0.764]
47 Overall, the ESDs we collected con- sisted of 9 events on average, but their lengths varied widely: For most scenarios, there were significant numbers of ESDs both with the minimum length of 5 and the maximum length of 16 and everything in between. [sent-96, score-0.159]
48 Combined with the fact that 93% of all individual event descriptions occurred only once, this makes it challenging to align the different ESDs with each other. [sent-97, score-0.558]
49 5 Temporal Script Graphs We will now describe how we compute a temporal script graph out of the collected data. [sent-98, score-0.754]
50 First, we identify phrases from different ESDs that describe the same event by computing a Multiple Sequence Alignment (MSA) of all ESDs for the same scenario. [sent-100, score-0.383]
51 Then we postprocess the MSA and convert it into a temporal script graph, which encodes and generalizes the temporal information contained in the original ESDs. [sent-101, score-0.877]
52 d of the line stand in line look at menu board decide on food and drink tell cashier your order listen to cashier repeat order ? [sent-136, score-0.178]
53 Figure 2: A MSA of four event sequence descriptions 5. [sent-206, score-0.641]
54 × A sequence alignment algorithm takes as its input some sequences s1, . [sent-209, score-0.195]
55 In bioinformatics, Rth feo re ilen-ments of Σ could be nucleotides and a sequence could be a DNA sequence; in our case, Σ contains the individual event descriptions in our data, and the sequences are the ESDs. [sent-213, score-0.686]
56 Each sequence alignment A can be assigned a cost c(A) in the following way: Xn Xm Xm c(A) = cgap· Σ? [sent-219, score-0.15]
57 2 Semantic similarity In order to apply MSA to the problem of aligning ESDs, we choose Σ to be the set of all individual event descriptions in a given scenario. [sent-230, score-0.607]
58 Intuitively, we want the MSA to prefer the alignment of two phrases if they are semantically similar, i. [sent-231, score-0.138]
59 For these reasons, standard methods for similarity assessment are not straightforwardly applicable: Simple bagof-words approaches do not provide sufficiently good results, and standard taggers and parsers cannot process our descriptions with sufficient accuracy. [sent-238, score-0.295]
60 ) On the basis of this pseudo-parse, we compute the similarity measure sim: sim = α · pred +β · subj +γ · obj where pred, subj, and obj are the similarity values for predicates, subjects and objects respectively, and α, β, γ are weights. [sent-242, score-0.279]
61 3 Building Temporal Script Graphs We can now compute a low-cost MSA for each scenario out of the ESDs. [sent-251, score-0.169]
62 From this alignment, we extract a temporal script graph, in the following way. [sent-252, score-0.628]
63 We interpret each node of the graph as representing a single event in the script, and the phrases that are collected in the node as different descriptions of this event; that is, we claim that these phrases are paraphrases in the context of this scenario. [sent-255, score-0.962]
64 At first we prune spurious nodes which contain only one event description. [sent-260, score-0.357]
65 Then we refine the graph by merging nodes whose elements should have been aligned in the first place but were missed by the MSA. [sent-261, score-0.198]
66 The semantic constraints check whether the = event descriptions of the merged node would be sufficiently consistent according to the similarity measure from Section 5. [sent-263, score-0.681]
67 , 2004) to 983 first cluster the event descriptions in u and v separately. [sent-266, score-0.604]
68 Then we combine the event descriptions from u and v and cluster the resulting set. [sent-267, score-0.604]
69 These structural constraints prevent the merging algorithm from introducing new temporal relations that are not supported by the input ESDs. [sent-271, score-0.314]
70 We take the output of this post-processing step as the temporal script graph. [sent-272, score-0.628]
71 One node created by the node merging step was the top left one, which combines one original node containing ‘walk into restaurant’ and another with ‘go to restaurant’ . [sent-275, score-0.151]
72 The graph mostly groups phrases together into event nodes quite well, although there are some exceptions, such as the ‘collect utensils’ node. [sent-276, score-0.518]
73 Similarly, the temporal information in the graph is pretty accurate. [sent-277, score-0.339]
74 6 Evaluation We evaluated the two core aspects of our sys- tem: its ability to recognize descriptions of the same event (paraphrases) and the resulting temporal constraints it defines on the event descriptions (happens-before relation). [sent-279, score-1.399]
75 For each scenario, we created a paraphrase set out of 30 randomly selected pairs of event de- scriptions which the system classified as paraphrases and 30 completely random pairs. [sent-287, score-0.463]
76 For the paraphrase set, an exemplary question we asked the rater looks as follows, instantiating the Scenario and the two descriptions to compare appropriately: Imagine two people, both telling a story about SCENARIO. [sent-291, score-0.415]
77 For the happens-before task, the question template was the following: Imagine somebody telling a story about SCENARIO in which the events event1 and event2 occur. [sent-293, score-0.167]
78 , 2004) and fed it all event descriptions of a scenario. [sent-299, score-0.558]
79 We first created a similarity graph with one node per event description. [sent-300, score-0.491]
80 com/ 984 SCENARIO sys PbRaEseCcIlSIONbaselev sys bRaEsCecAlLLbaselev sys basecFl-SCbOaRsEelev upper RMKTUepitcrao kytenkiw casilrcboetruhsatcemrusbedrIlaiendtceagrds0 . [sent-303, score-0.478]
81 1 with a weighted edge; the weight reflects the semantic similarity of the nodes’ event descriptions as described in Section 5. [sent-337, score-0.607]
82 To include all input information on inequality ofevents, we did not allow for edges between nodes containing two descriptions occurring together in one ESD. [sent-339, score-0.291]
83 The underlying assumption here is that two different event descriptions of the same ESD always represent distinct events. [sent-340, score-0.558]
84 The clustering baseline considers two phrases as paraphrases if they are in the same cluster. [sent-346, score-0.172]
85 It claims a happens-before relation between phrases e and f if some phrase in e’s cluster precedes some phrase in f’s cluster in the original ESDs. [sent-347, score-0.223]
86 The columns labelled sys contain the results of our system, basecl describes the clustering baseline and baselev the Levenshtein baseline. [sent-364, score-0.211]
87 (For tihneg average values, no sig985 SCENARIO sys PbRaEseCcIlSIONbaselev sys bRaEsCecAlLLbaselev sys basecFl-SCbOaRsEelev upper KTRMUcpitearokatyoneki wcailsrctobehrusatmcearusbdrIaliendtceagrds0 . [sent-374, score-0.478]
88 The only scenario in which our system doesn’t score very well is BUY FROM A VENDING MACHINE, where the upper bound is not significantly better either. [sent-414, score-0.307]
89 The clustering system, which can’t exploit the sequential information from the ESDs, has trouble distinguishing semantically similar phrases (high recall, low precision). [sent-415, score-0.149]
90 Regarding precision, our system outperforms both baselines in all scenarios except one (MAKE OMELETTE). [sent-420, score-0.172]
91 On average, the baselines do much better here than for the paraphrase task. [sent-424, score-0.152]
92 This is because once a system decides on paraphrase clusters that are essentially correct, it can retrieve correct information about the temporal order directly from the original ESDs. [sent-425, score-0.381]
93 One striking difference between the performance of our system on the OMICS data and on our own dataset is the relation to the upper bound: On our own data, the upper bound is almost al- ways significantly better than our system, whereas significant differences are rare on OMICS. [sent-430, score-0.217]
94 1 Summary In this paper, we have described a novel approach to the unsupervised learning of temporal script information. [sent-436, score-0.628]
95 Our approach differs from previous work in that we collect training data by directly asking non-expert users to describe a scenario, and 986 then apply a Multiple Sequence Alignment algorithm to extract scenario-specific paraphrase and temporal ordering information. [sent-437, score-0.412]
96 We showed that our system outperforms two baselines and sometimes approaches human-level performance, especially because it can exploit the sequential structure of the script descriptions to separate clusters of semantically similar events. [sent-438, score-0.752]
97 2 Discussion and Future Work We believe that we can scale this approach to model a large numbers of scenarios representing implicit shared knowledge. [sent-440, score-0.15]
98 This game will feature an algorithm that can generate new candidate scenarios without any supervision, for instance by identifying suitable sub-events of collected scripts (e. [sent-445, score-0.351]
99 Clustal: a package for performing multiple sequence alignment on a microcomputer. [sent-522, score-0.15]
100 Learning a probabilistic model of event sequences from internet weblog stories. [sent-547, score-0.357]
wordName wordTfidf (topN-words)
[('script', 0.379), ('esds', 0.313), ('event', 0.312), ('temporal', 0.249), ('descriptions', 0.246), ('msa', 0.21), ('scripts', 0.2), ('scenario', 0.169), ('sys', 0.133), ('events', 0.123), ('restaurant', 0.123), ('scenarios', 0.115), ('omics', 0.111), ('chambers', 0.108), ('counter', 0.105), ('food', 0.104), ('paraphrase', 0.095), ('eating', 0.092), ('graph', 0.09), ('sequence', 0.083), ('tsg', 0.083), ('commonsense', 0.081), ('upper', 0.079), ('walk', 0.078), ('phrases', 0.071), ('levenshtein', 0.069), ('alignment', 0.067), ('flake', 0.065), ('bound', 0.059), ('shopping', 0.059), ('baselines', 0.057), ('paraphrases', 0.056), ('esd', 0.055), ('swanson', 0.055), ('pay', 0.053), ('subj', 0.05), ('nathanael', 0.05), ('similarity', 0.049), ('scrambled', 0.048), ('eggs', 0.048), ('mueller', 0.048), ('enter', 0.048), ('wait', 0.048), ('cluster', 0.046), ('clustering', 0.045), ('pred', 0.045), ('sequences', 0.045), ('nodes', 0.045), ('manshadi', 0.044), ('singh', 0.044), ('story', 0.044), ('obj', 0.043), ('node', 0.04), ('eat', 0.04), ('exit', 0.039), ('bioinformatics', 0.039), ('mechanical', 0.037), ('braescecalllbaselev', 0.037), ('cashier', 0.037), ('cgap', 0.037), ('chamberlain', 0.037), ('cisomaibmrunoaysnkwfe', 0.037), ('durbin', 0.037), ('dustin', 0.037), ('lesptnhideoin', 0.037), ('needleman', 0.037), ('pbraeseccilsionbaselev', 0.037), ('stereotypical', 0.037), ('clusters', 0.037), ('von', 0.036), ('collected', 0.036), ('implicit', 0.035), ('barzilay', 0.035), ('ordering', 0.034), ('collect', 0.034), ('constraints', 0.034), ('sequential', 0.033), ('describes', 0.033), ('jurafsky', 0.033), ('ahn', 0.032), ('schank', 0.032), ('tsgs', 0.032), ('higgins', 0.032), ('rau', 0.032), ('regneri', 0.032), ('dna', 0.032), ('reid', 0.032), ('dabbish', 0.032), ('thfe', 0.032), ('erik', 0.032), ('elements', 0.032), ('merging', 0.031), ('excellence', 0.031), ('participants', 0.03), ('phrase', 0.03), ('pinkal', 0.03), ('homepage', 0.03), ('cheapest', 0.03), ('foo', 0.03), ('rater', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 165 acl-2010-Learning Script Knowledge with Web Experiments
Author: Michaela Regneri ; Alexander Koller ; Manfred Pinkal
Abstract: We describe a novel approach to unsupervised learning of the events that make up a script, along with constraints on their temporal ordering. We collect naturallanguage descriptions of script-specific event sequences from volunteers over the Internet. Then we compute a graph representation of the script’s temporal structure using a multiple sequence alignment algorithm. The evaluation of our system shows that we outperform two informed baselines.
2 0.22800216 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features
Author: Cosmin Bejan ; Sanda Harabagiu
Abstract: This paper examines how a new class of nonparametric Bayesian models can be effectively applied to an open-domain event coreference task. Designed with the purpose of clustering complex linguistic objects, these models consider a potentially infinite number of features and categorical outcomes. The evaluation performed for solving both within- and cross-document event coreference shows significant improvements of the models when compared against two baselines for this task.
3 0.1579607 225 acl-2010-Temporal Information Processing of a New Language: Fast Porting with Minimal Resources
Author: Francisco Costa ; Antonio Branco
Abstract: We describe the semi-automatic adaptation of a TimeML annotated corpus from English to Portuguese, a language for which TimeML annotated data was not available yet. In order to validate this adaptation, we use the obtained data to replicate some results in the literature that used the original English data. The fact that comparable results are obtained indicates that our approach can be used successfully to rapidly create semantically annotated resources for new languages.
4 0.12667088 106 acl-2010-Event-Based Hyperspace Analogue to Language for Query Expansion
Author: Tingxu Yan ; Tamsin Maxwell ; Dawei Song ; Yuexian Hou ; Peng Zhang
Abstract: p . zhang1 @ rgu .ac .uk Bag-of-words approaches to information retrieval (IR) are effective but assume independence between words. The Hyperspace Analogue to Language (HAL) is a cognitively motivated and validated semantic space model that captures statistical dependencies between words by considering their co-occurrences in a surrounding window of text. HAL has been successfully applied to query expansion in IR, but has several limitations, including high processing cost and use of distributional statistics that do not exploit syntax. In this paper, we pursue two methods for incorporating syntactic-semantic information from textual ‘events’ into HAL. We build the HAL space directly from events to investigate whether processing costs can be reduced through more careful definition of word co-occurrence, and improve the quality of the pseudo-relevance feedback by applying event information as a constraint during HAL construction. Both methods significantly improve performance results in comparison with original HAL, and interpolation of HAL and relevance model expansion outperforms either method alone.
5 0.11473639 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information
Author: WenTing Wang ; Jian Su ; Chew Lim Tan
Abstract: Syntactic knowledge is important for discourse relation recognition. Yet only heuristically selected flat paths and 2-level production rules have been used to incorporate such information so far. In this paper we propose using tree kernel based approach to automatically mine the syntactic information from the parse trees for discourse analysis, applying kernel function to the tree structures directly. These structural syntactic features, together with other normal flat features are incorporated into our composite kernel to capture diverse knowledge for simultaneous discourse identification and classification for both explicit and implicit relations. The experiment shows tree kernel approach is able to give statistical significant improvements over flat syntactic path feature. We also illustrate that tree kernel approach covers more structure information than the production rules, which allows tree kernel to further incorporate information from a higher dimension space for possible better discrimination. Besides, we further propose to leverage on temporal ordering information to constrain the interpretation of discourse relation, which also demonstrate statistical significant improvements for discourse relation recognition on PDTB 2.0 for both explicit and implicit as well. University of Singapore Singapore 117417 sg tacl @ comp .nus .edu . sg 1
6 0.089980975 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
7 0.083801478 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation
8 0.071381778 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
9 0.068963796 196 acl-2010-Plot Induction and Evolutionary Search for Story Generation
10 0.066548899 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
11 0.065963447 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
12 0.065693401 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery
13 0.065515324 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
14 0.065327212 133 acl-2010-Hierarchical Search for Word Alignment
15 0.065032452 141 acl-2010-Identifying Text Polarity Using Random Walks
16 0.062934846 124 acl-2010-Generating Image Descriptions Using Dependency Relational Patterns
17 0.062907308 85 acl-2010-Detecting Experiences from Weblogs
18 0.062675633 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
19 0.062550552 127 acl-2010-Global Learning of Focused Entailment Graphs
20 0.062175211 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
topicId topicWeight
[(0, -0.191), (1, 0.037), (2, -0.023), (3, -0.059), (4, 0.057), (5, 0.039), (6, -0.01), (7, 0.025), (8, -0.003), (9, -0.084), (10, -0.066), (11, -0.006), (12, -0.022), (13, -0.02), (14, 0.025), (15, 0.034), (16, 0.04), (17, 0.045), (18, 0.023), (19, -0.003), (20, 0.199), (21, -0.049), (22, -0.016), (23, -0.042), (24, 0.246), (25, -0.059), (26, 0.033), (27, 0.04), (28, -0.011), (29, -0.213), (30, 0.167), (31, 0.168), (32, 0.022), (33, -0.012), (34, 0.058), (35, -0.018), (36, 0.173), (37, 0.03), (38, 0.088), (39, -0.158), (40, 0.019), (41, -0.003), (42, 0.08), (43, -0.026), (44, 0.084), (45, -0.141), (46, 0.05), (47, 0.078), (48, -0.014), (49, 0.091)]
simIndex simValue paperId paperTitle
same-paper 1 0.96024072 165 acl-2010-Learning Script Knowledge with Web Experiments
Author: Michaela Regneri ; Alexander Koller ; Manfred Pinkal
Abstract: We describe a novel approach to unsupervised learning of the events that make up a script, along with constraints on their temporal ordering. We collect naturallanguage descriptions of script-specific event sequences from volunteers over the Internet. Then we compute a graph representation of the script’s temporal structure using a multiple sequence alignment algorithm. The evaluation of our system shows that we outperform two informed baselines.
2 0.77305633 225 acl-2010-Temporal Information Processing of a New Language: Fast Porting with Minimal Resources
Author: Francisco Costa ; Antonio Branco
Abstract: We describe the semi-automatic adaptation of a TimeML annotated corpus from English to Portuguese, a language for which TimeML annotated data was not available yet. In order to validate this adaptation, we use the obtained data to replicate some results in the literature that used the original English data. The fact that comparable results are obtained indicates that our approach can be used successfully to rapidly create semantically annotated resources for new languages.
3 0.69788921 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features
Author: Cosmin Bejan ; Sanda Harabagiu
Abstract: This paper examines how a new class of nonparametric Bayesian models can be effectively applied to an open-domain event coreference task. Designed with the purpose of clustering complex linguistic objects, these models consider a potentially infinite number of features and categorical outcomes. The evaluation performed for solving both within- and cross-document event coreference shows significant improvements of the models when compared against two baselines for this task.
4 0.61609817 106 acl-2010-Event-Based Hyperspace Analogue to Language for Query Expansion
Author: Tingxu Yan ; Tamsin Maxwell ; Dawei Song ; Yuexian Hou ; Peng Zhang
Abstract: p . zhang1 @ rgu .ac .uk Bag-of-words approaches to information retrieval (IR) are effective but assume independence between words. The Hyperspace Analogue to Language (HAL) is a cognitively motivated and validated semantic space model that captures statistical dependencies between words by considering their co-occurrences in a surrounding window of text. HAL has been successfully applied to query expansion in IR, but has several limitations, including high processing cost and use of distributional statistics that do not exploit syntax. In this paper, we pursue two methods for incorporating syntactic-semantic information from textual ‘events’ into HAL. We build the HAL space directly from events to investigate whether processing costs can be reduced through more careful definition of word co-occurrence, and improve the quality of the pseudo-relevance feedback by applying event information as a constraint during HAL construction. Both methods significantly improve performance results in comparison with original HAL, and interpolation of HAL and relevance model expansion outperforms either method alone.
5 0.41961294 196 acl-2010-Plot Induction and Evolutionary Search for Story Generation
Author: Neil McIntyre ; Mirella Lapata
Abstract: In this paper we develop a story generator that leverages knowledge inherent in corpora without requiring extensive manual involvement. A key feature in our approach is the reliance on a story planner which we acquire automatically by recording events, their participants, and their precedence relationships in a training corpus. Contrary to previous work our system does not follow a generate-and-rank architecture. Instead, we employ evolutionary search techniques to explore the space of possible stories which we argue are well suited to the story generation task. Experiments on generating simple children’s stories show that our system outperforms pre- vious data-driven approaches.
6 0.37370649 85 acl-2010-Detecting Experiences from Weblogs
7 0.37259501 111 acl-2010-Extracting Sequences from the Web
8 0.36530811 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation
9 0.3646684 108 acl-2010-Expanding Verb Coverage in Cyc with VerbNet
10 0.36373556 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People
11 0.35790741 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.
12 0.34272003 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation
13 0.33817315 126 acl-2010-GernEdiT - The GermaNet Editing Tool
14 0.32764387 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
15 0.32493263 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information
16 0.32476556 28 acl-2010-An Entity-Level Approach to Information Extraction
17 0.32167283 63 acl-2010-Comparable Entity Mining from Comparative Questions
18 0.31651604 141 acl-2010-Identifying Text Polarity Using Random Walks
19 0.31405163 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
20 0.30166936 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
topicId topicWeight
[(7, 0.013), (25, 0.054), (39, 0.016), (42, 0.024), (44, 0.376), (59, 0.088), (73, 0.045), (76, 0.011), (78, 0.043), (80, 0.011), (83, 0.079), (84, 0.03), (97, 0.011), (98, 0.117)]
simIndex simValue paperId paperTitle
1 0.89922649 243 acl-2010-Tree-Based and Forest-Based Translation
Author: Yang Liu ; Liang Huang
Abstract: unkown-abstract
same-paper 2 0.83780605 165 acl-2010-Learning Script Knowledge with Web Experiments
Author: Michaela Regneri ; Alexander Koller ; Manfred Pinkal
Abstract: We describe a novel approach to unsupervised learning of the events that make up a script, along with constraints on their temporal ordering. We collect naturallanguage descriptions of script-specific event sequences from volunteers over the Internet. Then we compute a graph representation of the script’s temporal structure using a multiple sequence alignment algorithm. The evaluation of our system shows that we outperform two informed baselines.
3 0.81034076 210 acl-2010-Sentiment Translation through Lexicon Induction
Author: Christian Scheible
Abstract: The translation of sentiment information is a task from which sentiment analysis systems can benefit. We present a novel, graph-based approach using SimRank, a well-established vertex similarity algorithm to transfer sentiment information between a source language and a target language graph. We evaluate this method in comparison with SO-PMI.
4 0.53322858 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
Author: Hiroshi Echizen-ya ; Kenji Araki
Abstract: As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. Our method correctly determines the matching words between two sentences using corresponding noun phrases. Moreover, our method determines the similarity between two sentences in terms of the noun-phrase order of appearance. Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced us- ing automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR7. Experimental results show that our method obtained the highest correlations among the methods in both sentence-level adequacy and fluency.
5 0.52278924 86 acl-2010-Discourse Structure: Theory, Practice and Use
Author: Bonnie Webber ; Markus Egg ; Valia Kordoni
Abstract: unkown-abstract
6 0.51320767 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future
7 0.50104964 31 acl-2010-Annotation
8 0.49479067 106 acl-2010-Event-Based Hyperspace Analogue to Language for Query Expansion
9 0.49356768 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features
10 0.48070163 158 acl-2010-Latent Variable Models of Selectional Preference
11 0.47930443 71 acl-2010-Convolution Kernel over Packed Parse Forest
12 0.47873479 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
13 0.47218806 225 acl-2010-Temporal Information Processing of a New Language: Fast Porting with Minimal Resources
14 0.47192588 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
15 0.47070235 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
16 0.47019336 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars
17 0.46891952 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning
18 0.46768862 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.
19 0.46666166 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
20 0.46542343 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition