acl acl2012 acl2012-57 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ioannis Konstas ; Mirella Lapata
Abstract: This paper proposes a data-driven method for concept-to-text generation, the task of automatically producing textual output from non-linguistic input. A key insight in our approach is to reduce the tasks of content selection (“what to say”) and surface realization (“how to say”) into a common parsing problem. We define a probabilistic context-free grammar that describes the structure of the input (a corpus of database records and text describing some of them) and represent it compactly as a weighted hypergraph. The hypergraph structure encodes exponentially many derivations, which we rerank discriminatively using local and global features. We propose a novel decoding algorithm for finding the best scoring derivation and generating in this setting. Experimental evaluation on the ATIS domain shows that our model outperforms a competitive discriminative system both using BLEU and in a judgment elicitation study.
Reference: text
sentIndex sentText sentNum sentScore
1 A key insight in our approach is to reduce the tasks of content selection (“what to say”) and surface realization (“how to say”) into a common parsing problem. [sent-6, score-0.452]
2 We define a probabilistic context-free grammar that describes the structure of the input (a corpus of database records and text describing some of them) and represent it compactly as a weighted hypergraph. [sent-7, score-0.674]
3 The hypergraph structure encodes exponentially many derivations, which we rerank discriminatively using local and global features. [sent-8, score-0.429]
4 We propose a novel decoding algorithm for finding the best scoring derivation and generating in this setting. [sent-9, score-0.241]
5 1 Introduction Concept-to-text generation broadly refers to the task of automatically producing textual output from non-linguistic input such as databases of records, logical form, and expert system knowledge bases (Reiter and Dale, 2000). [sent-11, score-0.198]
6 uk In this paper we present a data-driven approach to concept-to-text generation that is domainindependent, conceptually simple, and flexible. [sent-23, score-0.157]
7 Our generator learns from a set of database records and textual descriptions (for some of them). [sent-24, score-0.477]
8 Here, the records provide a structured representation of the flight details (e. [sent-26, score-0.447]
9 Given such input, our model determines which records to talk about (content selection) and which words to use for describing them (surface realization). [sent-29, score-0.376]
10 Rather than breaking up the generation process into a sequence of local decisions, we perform both tasks jointly. [sent-30, score-0.183]
11 A key insight in our approach is to reduce content selection and surface realization into a common parsing problem. [sent-31, score-0.452]
12 Specifically, we define a probabilistic context-free grammar (PCFG) that captures the structure of the database and its correspondence to natural language. [sent-32, score-0.212]
13 This grammar represents multiple derivations which we encode compactly using a weighted hypergraph (or packed forest), a data structure that defines a weight for each tree. [sent-33, score-0.753]
14 Following a generative approach, we could first learn the weights of the PCFG by maximising the joint likelihood of the model and then perform generation by finding the best derivation tree in the hypergraph. [sent-34, score-0.318]
15 The performance of this baseline system could be potentially further improved using discriminative reranking (Collins, 2000). [sent-35, score-0.177]
16 q)y∧eSprafwlcihgt Figure 1: Example of non-linguistic input as a structured database and logical form and its corresponding text. [sent-40, score-0.152]
17 We omit record fields that have no value, for the sake of brevity. [sent-41, score-0.492]
18 An appealing alternative is to rerank the hypergraph directly (Huang, 2008). [sent-43, score-0.323]
19 As it compactly encodes exponentially many derivations, we can explore a much larger hypothesis space than would have been possible with an n-best list. [sent-44, score-0.172]
20 Importantly, in this framework non-local features are computed at all internal hypergraph nodes, allowing the decoder to take advantage of them continuously at all stages of the generation process. [sent-45, score-0.485]
21 We incorporate features that are local with respect to a span of a sub-derivation in the packed forest; we also (approximately) include features that arbitrarily exceed span boundaries, thus capturing more global knowledge. [sent-46, score-0.249]
22 , 1994) demonstrate that our model outperforms a baseline based on the best derivation and a state- of-the-art discriminative system (Angeli et al. [sent-48, score-0.231]
23 2 Related Work Early discriminative approaches to text generation were introduced in spoken dialogue systems, and usually tackled content selection and surface realization separately. [sent-51, score-0.607]
24 Ratnaparkhi (2002) conceptualized surface realization (from a fixed meaning representation) as a classification task. [sent-52, score-0.279]
25 More recently, Wong and Mooney (2007) describe an approach to surface realization based on synchronous context-free grammars. [sent-56, score-0.279]
26 (2010) were the first to propose a unified approach to content selection and surface realization. [sent-59, score-0.244]
27 Their model operates over automatically induced alignments of words to database records (Liang et al. [sent-60, score-0.441]
28 , 2009) and decomposes into a sequence of discriminative local decisions. [sent-61, score-0.144]
29 They first determine which records in the database to talk about, then which fields of those records to mention, and finally which words to use to describe the chosen fields. [sent-62, score-0.938]
30 Their surface realization component performs decisions based on templates that are automatically extracted and smoothed with domain-specific knowledge in order to guarantee fluent output. [sent-64, score-0.279]
31 Our model is closest to Huang (2008) who also performs forest reranking on a hypergraph, using both local and non-local features, whose weights are tuned with the averaged perceptron algorithm (Collins, 2002). [sent-67, score-0.37]
32 We adapt forest reranking to generation and introduce several task-specific features that boost performance. [sent-68, score-0.362]
33 (2010), our model optimizes content selection and surface realization simultaneously, rather than as a sequence. [sent-70, score-0.447]
34 We have a single reranking component that applies throughout, whereas they train different discriminative models for each local decision. [sent-72, score-0.243]
35 3 Problem Formulation We assume our generator takes as input a set of database records d and produces text w that verbalizes some of these records. [sent-73, score-0.537]
36 For example, in Figure 1, flight is a record type with fields from and to. [sent-83, score-0.708]
37 The values of these fields are denver and boston and their type is categorical. [sent-84, score-0.22]
38 , database records paired with texts like those shown in Figure 1. [sent-87, score-0.441]
39 The database (and accompanying texts) are next converted into a PCFG whose weights are learned from training data. [sent-88, score-0.242]
40 PCFG derivations are represented as a weighted directed hypergraph (Gallo et al. [sent-89, score-0.415]
41 The weights on the hyperarcs are defined by a variety of feature functions, which we learn via a discriminative online update algorithm. [sent-91, score-0.173]
42 During testing, we are given a set of database records without the corresponding text. [sent-92, score-0.441]
43 Using the learned feature weights, we compile a hypergraph specific to this test input and decode it approximately (Huang, 2008). [sent-93, score-0.407]
44 The hypergraph representation allows us to decompose the feature functions and compute them piecemeal at each hyperarc (or sub-derivation), rather than at the root node as in conventional n-best list reranking. [sent-94, score-0.412]
45 Note that the algorithm does not separate content selection from surface realization, both subtasks are optimized jointly through the probabilistic parsing formulation. [sent-95, score-0.284]
46 1 Grammar Definition We capture the structure of the database with a number of CFG rewrite rules, in a similar way to how Liang et al. [sent-97, score-0.152]
47 These rules are purely syntactic (describing the intuitive relationship between records, records and fields, fields and corresponding words), and could apply to any database with similar structure irrespectively of the semantics of the domain. [sent-99, score-0.639]
48 Rule weights are governed by an underlying multinomial distribution and are shown in square brack371 square brackets. [sent-101, score-0.174]
49 v) is a function for generating integer numbers given the value of a field f. [sent-104, score-0.306]
50 Rule (1) denotes the expansion from the start symbol S to record R, which has the special start type (hence the notation R(start)). [sent-106, score-0.519]
51 Rule (2) defines a chain between two consecutive records ri and rj. [sent-107, score-0.39]
52 Here, FS(rj, start) represents the set of fields of the target rj, following the source record R(ri). [sent-108, score-0.492]
53 t) is a non-terminal place-holder for the continuation of the chain of records, and start in FS is a special boundary field between consecutive records. [sent-115, score-0.399]
54 The weight ofthis rule is the bigram probability of two records conditioned on their type, multiplied with a normalization factor λ. [sent-116, score-0.501]
55 , a record that has no fields and acts as a smoother for words that may not correspond to a particular record. [sent-119, score-0.53]
56 Rule (3) is simply an escape rule, so that the parsing process (on the record level) can finish. [sent-120, score-0.37]
57 Rule (4) is the equivalent of rule (2) at the field level, i. [sent-121, score-0.35]
58 , it describes the chaining of two consecutive fields fi and fj. [sent-123, score-0.221]
59 For example, the rule FS(flight1 , from) → F(flight1 ,to)FS(flight1 ,to), specifies th,a ftr we )s →houl Fd( ftalilkgh atbout the field to of record flight1 , after talking about the field from. [sent-126, score-0.936]
60 Analogously to the record level, we have also included a special null field type for the emission of words that do not correspond to a specific record field. [sent-127, score-1.138]
61 Rule (6) defines the expansion of field F to a sequence of (binarized) words W, with a weight equal to the bigram probability of the current word given the previous word, the current record, and field. [sent-128, score-0.381]
62 Rules (8) and (9) define the emission of words and integer numbers from W, given a field type and its value. [sent-129, score-0.399]
63 Its weight defines a multinomial distribution over all seen words, for every value of field f, given that the field type is categorical or the special null field. [sent-131, score-0.859]
64 Rule (9) is identical but for fields whose type is integer. [sent-132, score-0.22]
65 v) generates an integer number given the field value, using either of the following six ways (Liang et al. [sent-134, score-0.306]
66 , 2009): identical to the field value, rounding up or rounding down to a multiple of 5, rounding off to the clos- est multiple of 5 and finally adding or subtracting some unexplained noise. [sent-135, score-0.529]
67 1 The weight is a multinomial over the six generation function modes, given the record field f. [sent-136, score-0.801]
68 , a set of database records) which we represent compactly using a hypergraph or a packed forest (Klein and Manning, 2001 ; Huang, 2008). [sent-139, score-0.728]
69 2 Hypergraph Reranking For our generation task, we are given a set of database records d, and our goal is to find the best corresponding text w. [sent-142, score-0.558]
70 This corresponds to the best grammar derivation among a set of candidate derivations represented implicitly in the hypergraph structure. [sent-143, score-0.628]
71 As shown in Algo- Φ rithm 1, the perceptron makes several passes over the training scenarios, and in each iteration it computes the best scoring ( wˆ, hˆ) among the candidate derivations, given the current weights α. [sent-162, score-0.148]
72 In line 6, the algorithm updates α with the difference (if any) between the feature representations of the best scoring derivation ( wˆ, and the the oracle derivation (w∗ ,h+). [sent-163, score-0.478]
73 3) and discuss our definition for the oracle derivation (w∗ ,h+) (Section 3. [sent-170, score-0.232]
74 According to the grammar in Table 1, there is no direct hyperedge between nodes representing words (W) and nodes representing the set of fields these correspond to (FS); rather, W and FS are connected implicitly via individual fields (F). [sent-182, score-0.591]
75 Note, that in order to estimate the trigram feature at the FS node, we need to carry word information in the derivations of its antecedents, as we go bottom-up. [sent-183, score-0.221]
76 Essentially, we perform bottomup Viterbi search, visiting the nodes in reverse topological order, and keeping the k-best derivations for each. [sent-185, score-0.19]
77 The score of each derivation is a linear combination of local and non-local features weights. [sent-186, score-0.262]
78 In machine translation, a decoder that implements forest rescoring (Huang and Chiang, 2007) uses the lan- guage model as an external criterion of the goodness of sub-translations on account of their grammaticality. [sent-187, score-0.182]
79 Analogously here, non-local features influence the selection of the best combinations, by introducing knowledge that exceeds the confines of the node under consideration and thus depend on the sub-derivations generated so far. [sent-188, score-0.228]
80 , word trigrams spanning a field node rely on evidence from antecedent nodes that may be arbitrarily deeper than the field’s immediate children). [sent-191, score-0.452]
81 Since in generation we must emit rather than observe the words, for each leaf node we therefore output the k-best words according to the learned weights α of the Alignment feature (see Section 4. [sent-193, score-0.376]
82 This generation task is far from trivial: the search space on the word level is the size of the vocabulary and each field of a record can potentially generate all words. [sent-195, score-0.703]
83 Also, note that in decoding it is useful to have a way to score different output 2We also store field information tures, described in Section 4. [sent-196, score-0.298]
84 Rather than setting w to a fixed length, we grethlys on a lRiantehaer r r tehgarnes ssieottnin npgre wdic toto ar fthixaetd uses gththe, counts of each record type per scenario as features and is able to produce variable length texts. [sent-199, score-0.477]
85 4 Oracle Derivation So far we have remained agnostic with respect to the oracle derivation (w∗, h+). [sent-201, score-0.232]
86 We do not have the gold-standard alignment between the database records and the text that verbalizes them. [sent-204, score-0.542]
87 , 1994) which consists of transcriptions of spontaneous utterances of users interacting with a hypothetical online flight booking system. [sent-212, score-0.251]
88 , flights from orlando to milwaukee, show flights from orlando to milwaukee leaving after six o ’clock) each accompanied with an SQL query to a booking system and the results of this query. [sent-215, score-0.377]
89 , a question about the origin of a flight or its time of arrival). [sent-218, score-0.158]
90 and Collins (2007) instead, which combines the utterances of a single user in one scenario and contains 5,426 scenarios in total; each scenario corresponds to a (manually annotated) formal meaning representation (λ-expression) and its translation in natural language. [sent-224, score-0.208]
91 Lambda expressions were automatically converted into records, fields and values following the conventions adopted in Liang et al. [sent-225, score-0.162]
92 4 Given a lambda expression like the one shown in Figure 1, we first create a record for each variable and constant (e. [sent-227, score-0.409]
93 We then assign record types according to the corresponding class types (e. [sent-230, score-0.33]
94 Next, fields and values are added from predicates with two arguments with the class type of the first argument matching that of the record type. [sent-233, score-0.55]
95 We also defined special record types, such as condition and search. [sent-235, score-0.367]
96 The latter is introduced for every lambda operator and assigned the categorical field what with the value flight which refers to the record type of variable x. [sent-236, score-0.981]
97 In addition, we used a generatively trained PCFG as a baseline feature and an alignment feature based on the cooccurrence of records (or fields) with words. [sent-246, score-0.424]
98 Alignment Features Instances ofthis feature family refer to the count of each PCFG rule from Table 1. [sent-257, score-0.176]
99 They also tackle anomalies in the generated output, due to the ergodicity of the CFG rules at the record and field level: Word Bigrams/Trigrams This is a group of non-local feature functions that count word n-grams at every level in the hypergraph (see Figure 2(b)). [sent-261, score-0.952]
100 , fields from and to frequently correspond to two or three words such as ‘new york’ and ‘salt lake city’ (see Figure 2(d)). [sent-265, score-0.2]
wordName wordTfidf (topN-words)
[('record', 0.33), ('records', 0.289), ('hypergraph', 0.283), ('field', 0.256), ('fs', 0.182), ('realization', 0.168), ('fields', 0.162), ('flight', 0.158), ('derivation', 0.153), ('database', 0.152), ('atis', 0.151), ('compactly', 0.132), ('derivations', 0.132), ('generation', 0.117), ('surface', 0.111), ('forest', 0.103), ('pcfg', 0.103), ('reranking', 0.099), ('rule', 0.094), ('angeli', 0.091), ('rounding', 0.091), ('huang', 0.086), ('flights', 0.079), ('konstas', 0.079), ('lambda', 0.079), ('oracle', 0.079), ('selection', 0.078), ('discriminative', 0.078), ('scenarios', 0.076), ('liang', 0.072), ('local', 0.066), ('confines', 0.06), ('milwaukee', 0.06), ('verbalizes', 0.06), ('grammar', 0.06), ('consecutive', 0.059), ('nodes', 0.058), ('categorical', 0.058), ('travel', 0.058), ('packed', 0.058), ('type', 0.058), ('collins', 0.057), ('analogously', 0.056), ('content', 0.055), ('null', 0.054), ('perceptron', 0.054), ('hyperedge', 0.053), ('arrival', 0.053), ('booking', 0.053), ('orlando', 0.053), ('air', 0.052), ('trigrams', 0.052), ('multinomial', 0.052), ('cfg', 0.05), ('integer', 0.05), ('weights', 0.048), ('dahl', 0.048), ('feature', 0.047), ('start', 0.047), ('node', 0.047), ('weight', 0.046), ('scoring', 0.046), ('talk', 0.046), ('scenario', 0.046), ('rj', 0.045), ('features', 0.043), ('learned', 0.042), ('decoder', 0.042), ('refers', 0.042), ('reiter', 0.042), ('trigram', 0.042), ('mooney', 0.042), ('decoding', 0.042), ('defines', 0.042), ('describing', 0.041), ('alignment', 0.041), ('rerank', 0.04), ('exponentially', 0.04), ('conceptually', 0.04), ('emit', 0.04), ('parentheses', 0.04), ('utterances', 0.04), ('parsing', 0.04), ('arbitrarily', 0.039), ('broadly', 0.039), ('correspond', 0.038), ('dataset', 0.037), ('bigram', 0.037), ('goodness', 0.037), ('tures', 0.037), ('square', 0.037), ('special', 0.037), ('generator', 0.036), ('dale', 0.036), ('zettlemoyer', 0.036), ('rules', 0.036), ('rather', 0.035), ('decode', 0.035), ('emission', 0.035), ('ofthis', 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
Author: Ioannis Konstas ; Mirella Lapata
Abstract: This paper proposes a data-driven method for concept-to-text generation, the task of automatically producing textual output from non-linguistic input. A key insight in our approach is to reduce the tasks of content selection (“what to say”) and surface realization (“how to say”) into a common parsing problem. We define a probabilistic context-free grammar that describes the structure of the input (a corpus of database records and text describing some of them) and represent it compactly as a weighted hypergraph. The hypergraph structure encodes exponentially many derivations, which we rerank discriminatively using local and global features. We propose a novel decoding algorithm for finding the best scoring derivation and generating in this setting. Experimental evaluation on the ATIS domain shows that our model outperforms a competitive discriminative system both using BLEU and in a judgment elicitation study.
2 0.15714863 73 acl-2012-Discriminative Learning for Joint Template Filling
Author: Einat Minkov ; Luke Zettlemoyer
Abstract: This paper presents a joint model for template filling, where the goal is to automatically specify the fields of target relations such as seminar announcements or corporate acquisition events. The approach models mention detection, unification and field extraction in a flexible, feature-rich model that allows for joint modeling of interdependencies at all levels and across fields. Such an approach can, for example, learn likely event durations and the fact that start times should come before end times. While the joint inference space is large, we demonstrate effective learning with a Perceptron-style approach that uses simple, greedy beam decoding. Empirical results in two benchmark domains demonstrate consistently strong performance on both mention de- tection and template filling tasks.
3 0.13546418 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
Author: Xiao Chen ; Chunyu Kit
Abstract: This paper presents a higher-order model for constituent parsing aimed at utilizing more local structural context to decide the score of a grammar rule instance in a parse tree. Experiments on English and Chinese treebanks confirm its advantage over its first-order version. It achieves its best F1 scores of 91.86% and 85.58% on the two languages, respectively, and further pushes them to 92.80% and 85.60% via combination with other highperformance parsers.
Author: Patrick Simianer ; Stefan Riezler ; Chris Dyer
Abstract: With a few exceptions, discriminative training in statistical machine translation (SMT) has been content with tuning weights for large feature sets on small development data. Evidence from machine learning indicates that increasing the training sample size results in better prediction. The goal of this paper is to show that this common wisdom can also be brought to bear upon SMT. We deploy local features for SCFG-based SMT that can be read off from rules at runtime, and present a learning algorithm that applies ‘1/‘2 regularization for joint feature selection over distributed stochastic learning processes. We present experiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets.
5 0.1102233 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Hao Zhang ; Qiang Li
Abstract: We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers different choices of decoding algrithms, such as phrase-based decoding, decoding as parsing/tree-parsing and forest-based decoding. Moreover, several useful utilities were distributed with the toolkit, including a discriminative reordering model, a simple and fast language model, and an implementation of minimum error rate training for weight tuning. 1
6 0.10856622 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
7 0.097041279 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
8 0.0945848 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
9 0.092359766 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment
10 0.090530328 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
11 0.086274192 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
12 0.086031243 71 acl-2012-Dependency Hashing for n-best CCG Parsing
13 0.082112938 108 acl-2012-Hierarchical Chunk-to-String Translation
14 0.079224184 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
15 0.076013319 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
16 0.075511158 83 acl-2012-Error Mining on Dependency Trees
17 0.074483164 51 acl-2012-Collective Generation of Natural Image Descriptions
18 0.07397449 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
19 0.072232656 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
20 0.071863726 94 acl-2012-Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection
topicId topicWeight
[(0, -0.242), (1, -0.032), (2, -0.063), (3, -0.023), (4, -0.072), (5, 0.024), (6, 0.005), (7, 0.078), (8, 0.042), (9, 0.017), (10, -0.036), (11, -0.065), (12, -0.132), (13, 0.007), (14, -0.015), (15, 0.002), (16, 0.051), (17, 0.091), (18, 0.001), (19, -0.058), (20, 0.047), (21, 0.02), (22, -0.048), (23, 0.009), (24, -0.036), (25, 0.028), (26, 0.019), (27, -0.129), (28, -0.095), (29, -0.061), (30, 0.046), (31, 0.006), (32, 0.066), (33, 0.031), (34, -0.142), (35, 0.009), (36, 0.052), (37, 0.012), (38, -0.04), (39, -0.065), (40, 0.195), (41, -0.01), (42, -0.076), (43, -0.073), (44, 0.055), (45, 0.021), (46, 0.026), (47, -0.157), (48, 0.007), (49, 0.208)]
simIndex simValue paperId paperTitle
same-paper 1 0.95790601 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
Author: Ioannis Konstas ; Mirella Lapata
Abstract: This paper proposes a data-driven method for concept-to-text generation, the task of automatically producing textual output from non-linguistic input. A key insight in our approach is to reduce the tasks of content selection (“what to say”) and surface realization (“how to say”) into a common parsing problem. We define a probabilistic context-free grammar that describes the structure of the input (a corpus of database records and text describing some of them) and represent it compactly as a weighted hypergraph. The hypergraph structure encodes exponentially many derivations, which we rerank discriminatively using local and global features. We propose a novel decoding algorithm for finding the best scoring derivation and generating in this setting. Experimental evaluation on the ATIS domain shows that our model outperforms a competitive discriminative system both using BLEU and in a judgment elicitation study.
Author: Patrick Simianer ; Stefan Riezler ; Chris Dyer
Abstract: With a few exceptions, discriminative training in statistical machine translation (SMT) has been content with tuning weights for large feature sets on small development data. Evidence from machine learning indicates that increasing the training sample size results in better prediction. The goal of this paper is to show that this common wisdom can also be brought to bear upon SMT. We deploy local features for SCFG-based SMT that can be read off from rules at runtime, and present a learning algorithm that applies ‘1/‘2 regularization for joint feature selection over distributed stochastic learning processes. We present experiments on learning on 1.5 million training sentences, and show significant improvements over tuning discriminative models on small development sets.
3 0.61121249 215 acl-2012-WizIE: A Best Practices Guided Development Environment for Information Extraction
Author: Yunyao Li ; Laura Chiticariu ; Huahai Yang ; Frederick Reiss ; Arnaldo Carreno-fuentes
Abstract: Information extraction (IE) is becoming a critical building block in many enterprise applications. In order to satisfy the increasing text analytics demands of enterprise applications, it is crucial to enable developers with general computer science background to develop high quality IE extractors. In this demonstration, we present WizIE, an IE development environment intended to reduce the development life cycle and enable developers with little or no linguistic background to write high quality IE rules. WizI E provides an integrated wizard-like environment that guides IE developers step-by-step throughout the entire development process, based on best practices synthesized from the experience of expert developers. In addition, WizIE reduces the manual effort involved in performing key IE development tasks by offering automatic result explanation and rule discovery functionality. Preliminary results indicate that WizI E is a step forward towards enabling extractor development for novice IE developers.
4 0.59691423 73 acl-2012-Discriminative Learning for Joint Template Filling
Author: Einat Minkov ; Luke Zettlemoyer
Abstract: This paper presents a joint model for template filling, where the goal is to automatically specify the fields of target relations such as seminar announcements or corporate acquisition events. The approach models mention detection, unification and field extraction in a flexible, feature-rich model that allows for joint modeling of interdependencies at all levels and across fields. Such an approach can, for example, learn likely event durations and the fact that start times should come before end times. While the joint inference space is large, we demonstrate effective learning with a Perceptron-style approach that uses simple, greedy beam decoding. Empirical results in two benchmark domains demonstrate consistently strong performance on both mention de- tection and template filling tasks.
5 0.58206147 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
Author: Xiao Chen ; Chunyu Kit
Abstract: This paper presents a higher-order model for constituent parsing aimed at utilizing more local structural context to decide the score of a grammar rule instance in a parse tree. Experiments on English and Chinese treebanks confirm its advantage over its first-order version. It achieves its best F1 scores of 91.86% and 85.58% on the two languages, respectively, and further pushes them to 92.80% and 85.60% via combination with other highperformance parsers.
6 0.55520117 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
8 0.51362073 108 acl-2012-Hierarchical Chunk-to-String Translation
9 0.50232863 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs
10 0.4907701 107 acl-2012-Heuristic Cube Pruning in Linear Time
11 0.48769164 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench
12 0.47106513 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction
13 0.46346188 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
14 0.45800158 74 acl-2012-Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach
15 0.4541069 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling
16 0.45076618 121 acl-2012-Iterative Viterbi A* Algorithm for K-Best Sequential Decoding
17 0.45003572 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
18 0.44242731 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
19 0.43404958 105 acl-2012-Head-Driven Hierarchical Phrase-based Translation
20 0.40143937 181 acl-2012-Spectral Learning of Latent-Variable PCFGs
topicId topicWeight
[(26, 0.038), (28, 0.051), (30, 0.027), (37, 0.032), (39, 0.042), (57, 0.01), (59, 0.015), (74, 0.036), (82, 0.401), (84, 0.025), (85, 0.028), (90, 0.123), (92, 0.05), (94, 0.014), (99, 0.05)]
simIndex simValue paperId paperTitle
1 0.85673809 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
Author: Oleksandr Kolomiyets ; Steven Bethard ; Marie-Francine Moens
Abstract: We propose a new approach to characterizing the timeline of a text: temporal dependency structures, where all the events of a narrative are linked via partial ordering relations like BEFORE, AFTER, OVERLAP and IDENTITY. We annotate a corpus of children’s stories with temporal dependency trees, achieving agreement (Krippendorff’s Alpha) of 0.856 on the event words, 0.822 on the links between events, and of 0.700 on the ordering relation labels. We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596. Our analysis of the dependency parser errors gives some insights into future research directions.
2 0.85344505 188 acl-2012-Subgroup Detector: A System for Detecting Subgroups in Online Discussions
Author: Amjad Abu-Jbara ; Dragomir Radev
Abstract: We present Subgroup Detector, a system for analyzing threaded discussions and identifying the attitude of discussants towards one another and towards the discussion topic. The system uses attitude predictions to detect the split of discussants into subgroups of opposing views. The system uses an unsupervised approach based on rule-based opinion target detecting and unsupervised clustering techniques. The system is open source and is freely available for download. An online demo of the system is available at: http://clair.eecs.umich.edu/SubgroupDetector/
same-paper 3 0.82631326 57 acl-2012-Concept-to-text Generation via Discriminative Reranking
Author: Ioannis Konstas ; Mirella Lapata
Abstract: This paper proposes a data-driven method for concept-to-text generation, the task of automatically producing textual output from non-linguistic input. A key insight in our approach is to reduce the tasks of content selection (“what to say”) and surface realization (“how to say”) into a common parsing problem. We define a probabilistic context-free grammar that describes the structure of the input (a corpus of database records and text describing some of them) and represent it compactly as a weighted hypergraph. The hypergraph structure encodes exponentially many derivations, which we rerank discriminatively using local and global features. We propose a novel decoding algorithm for finding the best scoring derivation and generating in this setting. Experimental evaluation on the ATIS domain shows that our model outperforms a competitive discriminative system both using BLEU and in a judgment elicitation study.
4 0.81023765 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction
Author: Seokhwan Kim ; Gary Geunbae Lee
Abstract: Although researchers have conducted extensive studies on relation extraction in the last decade, supervised approaches are still limited because they require large amounts of training data to achieve high performances. To build a relation extractor without significant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision. This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English-Korean parallel corpus.
5 0.66970432 187 acl-2012-Subgroup Detection in Ideological Discussions
Author: Amjad Abu-Jbara ; Pradeep Dasigi ; Mona Diab ; Dragomir Radev
Abstract: The rapid and continuous growth of social networking sites has led to the emergence of many communities of communicating groups. Many of these groups discuss ideological and political topics. It is not uncommon that the participants in such discussions split into two or more subgroups. The members of each subgroup share the same opinion toward the discussion topic and are more likely to agree with members of the same subgroup and disagree with members from opposing subgroups. In this paper, we propose an unsupervised approach for automatically detecting discussant subgroups in online communities. We analyze the text exchanged between the participants of a discussion to identify the attitude they carry toward each other and towards the various aspects of the discussion topic. We use attitude predictions to construct an attitude vector for each discussant. We use clustering techniques to cluster these vectors and, hence, determine the subgroup membership of each participant. We compare our methods to text clustering and other baselines, and show that our method achieves promising results.
6 0.57993245 191 acl-2012-Temporally Anchored Relation Extraction
8 0.54129684 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive
9 0.53136802 31 acl-2012-Authorship Attribution with Author-aware Topic Models
10 0.52466363 60 acl-2012-Coupling Label Propagation and Constraints for Temporal Fact Extraction
11 0.51186645 99 acl-2012-Finding Salient Dates for Building Thematic Timelines
12 0.50771207 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text
13 0.50326228 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
14 0.49539131 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations
15 0.49172932 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction
16 0.48756331 37 acl-2012-Baselines and Bigrams: Simple, Good Sentiment and Topic Classification
17 0.47830421 73 acl-2012-Discriminative Learning for Joint Template Filling
18 0.47771099 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
19 0.47671151 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
20 0.4765828 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning