acl acl2013 acl2013-324 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shu Cai ; Kevin Knight
Abstract: The evaluation of whole-sentence semantic structures plays an important role in semantic parsing and large-scale semantic structure annotation. However, there is no widely-used metric to evaluate wholesentence semantic structures. In this paper, we present smatch, a metric that calculates the degree of overlap between two semantic feature structures. We give an efficient algorithm to compute the metric and show the results of an inter-annotator agreement study.
Reference: text
sentIndex sentText sentNum sentScore
1 Smatch: an Evaluation Metric for Semantic Feature Structures Shu Cai USC Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292 shucai @ i i s . [sent-1, score-0.043]
2 edu Abstract The evaluation of whole-sentence semantic structures plays an important role in semantic parsing and large-scale semantic structure annotation. [sent-2, score-0.332]
3 However, there is no widely-used metric to evaluate wholesentence semantic structures. [sent-3, score-0.25]
4 In this paper, we present smatch, a metric that calculates the degree of overlap between two semantic feature structures. [sent-4, score-0.26]
5 We give an efficient algorithm to compute the metric and show the results of an inter-annotator agreement study. [sent-5, score-0.171]
6 1 Introduction The goal of semantic parsing is to generate all semantic relationships in a text. [sent-6, score-0.208]
7 Its output is often represented by whole-sentence semantic structures. [sent-7, score-0.084]
8 Evaluating such structures is necessary for semantic parsing tasks, as well as semantic annotation tasks which create linguistic resources for semantic parsing. [sent-8, score-0.365]
9 However, there is no widely-used evaluation method for whole-sentence semantic structures. [sent-9, score-0.084]
10 Current whole-sentence semantic parsing is mainly evaluated in two ways: 1. [sent-10, score-0.124]
11 task correctness (Tang and Mooney, 2001), which evaluates on an NLP task that uses the parsing results; 2. [sent-11, score-0.101]
12 Nevertheless, it is worthwhile to explore evaluation methods that use scores which range from 0 to 1 (“partial credit”) to measure whole-sentence semantic structures. [sent-13, score-0.113]
13 By using such methods, we are able to differentiate between two similar whole- sentence semantic structures regardless of specific Kevin Knight USC Information Sciences Institute 4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292 knight @ i i s . [sent-14, score-0.2]
14 In this work, we provide an evaluation metric that uses the degree of overlap between two whole-sentence semantic structures as the partial credit. [sent-16, score-0.3]
15 In this paper, we observe that the difficulty of computing the degree of overlap between two whole-sentence semantic feature structures comes from determining an optimal variable alignment between them, and further prove that finding such alignment is NP-complete. [sent-17, score-0.391]
16 We investigate how to compute this metric and provide several practical and replicable computing methods by using Integer Linear Programming (ILP) and hill-climbing method. [sent-18, score-0.164]
17 We show that our metric can be used for measuring the annotator agreement in largescale linguistic annotation, and evaluating seman- tic parsing. [sent-19, score-0.175]
18 2 Semantic Overlap We work on a semantic feature structure representation in a standard neo-Davidsonian (Davidson, 1969; Parsons, 1990) framework. [sent-20, score-0.084]
19 For example, semantics of the sentence “the boy wants to go” is represented by the following directed graph: In this graph, there are three concepts: want01, boy, and go-01 . [sent-21, score-0.153]
20 The frame want-01 has two arguments connected with ARG0 and ARG1, and go01 has an argument (which is also the same boy instance) connected with ARG0. [sent-23, score-0.118]
21 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 748–752, Following (Langkilde and Knight, 1998) and (Langkilde-Geary, 2002), we refer to this semantic representation as AMR (Abstract Meaning Representation). [sent-26, score-0.084]
22 The difficulty is that variable names are not shared between the two AMRs, so there are multiple ways to compute the propositional based on different variable mappings. [sent-31, score-0.332]
23 overlap We there- fore define the smatch score (for semantic match) as the maximum f-score obtainable via a one-to- one matching of variables between the two AMRs. [sent-32, score-0.995]
24 In the example above, there are six ways to match up variables between the two AMRs: x=a x=a x=b x=b x=c x=c , , , , , , y=b y=c y=a y=c y=a y=b , , , , , , z =c z =b z =c z =a z =b z =a : : : : : : M 4 1 0 0 0 2 P 4/5 1/ 5 0/ 5 0/ 5 0/ 5 2/5 R 4/6 1/ 6 0/ 6 0/ 6 0/ 6 2/ 6 F 0 . [sent-33, score-0.153]
25 7 3 Here, M is the number of propositional triples that agree given a variable mapping, P is the precision of the second AMR against the first, R is its recall, and F is its f-score. [sent-40, score-0.351]
26 Exhaustively enumerating all variable mappings requires computing the f-score for n! [sent-43, score-0.198]
27 − 3 Computing the Metric This section describes how to compute the smatch score. [sent-51, score-0.69]
28 For example, it would match a in the first AMR example with x in the second AMR example of Section 2, because both are instances of want-01. [sent-57, score-0.033]
29 If there are two or more variables to choose from, we pick the first available one. [sent-58, score-0.12]
30 We can get an optimal solution using integer linear programming (ILP). [sent-61, score-0.072]
31 We create two types of variables: • • (Variable mapping) vij = 1 iff the ith vari(aVblaer ainb lAeM mRap1 i sn mapped to the jth variable in AMR2 (otherwise vij = 0) (Triple match) tkl = 1 iff AMR1 triple (kT implatech mesa cAh)MR t2 triple l, otherwise tkl = 0. [sent-62, score-0.797]
32 A triple relation1 (xy) matches relation2(wz) iff relation1 = relation2, vxw = 1, and vyz = 1or y and z are the same concept. [sent-63, score-0.337]
33 Finally, we ask the ILP solver to maximize: X tkl Xkl which denotes the maximum number of matching triples which lead to the smatch score. [sent-65, score-0.949]
34 Finally, we develop a portable heuristic algorithm that does not require an ILP solver1 . [sent-67, score-0.029]
35 We begin with m random one-to-one mappings between the m variables of AMR1 and the n variables of AMR2. [sent-69, score-0.331]
36 Each variable mapping is a pair (i, map(i)) with 1 ≤ i ≤ m and 1 ≤ map(i) ≤ n. [sent-70, score-0.149]
37 Wi))e wrefitehr t1o ≤the m mappings as a vmaaripab(il)e mapping esta rteef. [sent-71, score-0.114]
38 e We first generate a random initial variable mapping state, compute its triple match number, then hill-climb via two types of small changes: 1. [sent-72, score-0.366]
39 Move one of the m mappings to a currentlyunmapped variable from the n. [sent-73, score-0.165]
40 + Any variable mapping state has m(n m) m(m 1) = m(n 1) neighbors during )th +e hill-climbing =sea mrch(n. [sent-76, score-0.149]
41 W −e greedily bcohrosos due rtihneg b tehset neighbor, repeating until no neighbor improves the number of triple matches. [sent-77, score-0.156]
42 We experiment with two modifications to the greedy search: (1) executing multiple random restarts to avoid local optima, and (2) using our Baseline concept matching (“smart initialization”) − − − instead of random initialization. [sent-78, score-0.169]
43 There is unlikely to be an exact polynomial-time algorithm for computing smatch. [sent-80, score-0.033]
44 We can reduce the 0-1 Maximum Quadratic Assignment Problem (0-1-Max-QAP) (Nagarajan and Sviridenko, 2009) and the subgraph isomorphism problem directly to the full smatch problem on graphs. [sent-81, score-0.752]
45 Fortunately, the next section shows that the smatch methods above are efficient and effective. [sent-84, score-0.659]
46 2Thanks to David Chiang for observing the subgraph isomorphism reduction. [sent-89, score-0.093]
47 Our study has 4 annotators (A, B, C, D), who then converge on a consensus annotation E. [sent-92, score-0.132]
48 Each time annotators build AMRs for 4 sentences from the Wall Street Journal corpus. [sent-99, score-0.036]
49 Each individual smatch score is a document-level score of 4 AMR pairs. [sent-104, score-0.659]
50 3 ILP scores are optimal, so lower scores (in bold) indicate search errors. [sent-105, score-0.058]
51 Table 2 summarizes search accuracy as a percentage of smatch scores that equal that of ILP. [sent-106, score-0.688]
52 Results show that the restarts are essential for hillclimbing, and that 9 restarts are sufficient to obtain good quality. [sent-107, score-0.184]
53 The table also shows total runtimes over 200 AMR pairs (10 annotator pairs, 5 sentence groups, 4 AMR pairs per group). [sent-108, score-0.059]
54 Heuristic search with smart initialization and 4 restarts (S+4R) gives the best trade-off between accuracy and speed, so this is the setting we use in practice. [sent-109, score-0.244]
55 Figure 1 shows smatch scores of each annotator (A-D) against the consensus annotation (E). [sent-110, score-0.819]
56 The 3For documents containing multiple AMRs, we use the sum of matched triples over all AMR pairs to compute precision, recall, and f-score, much like corpus-level Bleu (Papineni et al. [sent-111, score-0.181]
57 750 Table 1: Inter-annotator smatch agreement for 5 groups of sentences, as computed with seven different methods (Base, ILP, R, 10R, S, S+4R, S+9R). [sent-113, score-0.699]
58 96%R9 Table 2: Accuracy and running time (seconds) of various computing methods of smatch over 200 AMR pairs. [sent-123, score-0.692]
59 plot demonstrates that, as time goes by, annotators reach better agreement with the consensus. [sent-124, score-0.076]
60 We also note that smatch is used to measure the accuracy of machine-generated AMRs. [sent-125, score-0.659]
61 , 2012) use it to evaluate automatic semantic parsing in a narrow domain, while Ulf Her- mjakob4 has developed a heuristic algorithm that exploits and supplements Ontonotes annotations (Pradhan et al. [sent-127, score-0.179]
62 , 2007) in order to automatically create AMRs for Ontonotes sentences, with a smatch score of 0. [sent-128, score-0.659]
63 5 Related Work Related work on directly measuring the semantic representation includes the method in (Dridan and Oepen, 2011), which evaluates semantic parser output directly by comparing semantic substructures, though they require an alignment between sentence spans and semantic sub-structures. [sent-130, score-0.396]
64 In contrast, our metric does not require the align4personal communication s S Time Figure 1: Smatch scores of annotators (A-D) against the consensus annotation (E) over time. [sent-131, score-0.261]
65 ment between an input sentence and its semantic analysis. [sent-132, score-0.084]
66 , 2008) propose a metric which computes the maximum score by any alignment between LF graphs, but they do not address how to determine the alignments. [sent-134, score-0.157]
67 6 Conclusion and Future Work We present an evaluation metric for wholesentence semantic analysis, and show that it can be computed efficiently. [sent-135, score-0.25]
68 We use the metric to measure semantic annotation agreement rates and parsing accuracy. [sent-136, score-0.297]
69 In the future, we plan to investigate how to adapt smatch to other semantic representations. [sent-137, score-0.743]
70 751 7 Acknowledgements We would like to thank David Chiang, Hui Zhang, other ISI colleagues and our anonymous review- ers for their thoughtful comments. [sent-138, score-0.027]
71 Sentences Classi- nual Meeting on Association for Computational Linguistics. [sent-217, score-0.027]
wordName wordTfidf (topN-words)
[('smatch', 0.659), ('amr', 0.333), ('amrs', 0.231), ('triples', 0.15), ('ance', 0.144), ('inst', 0.138), ('ilp', 0.129), ('triple', 0.127), ('variables', 0.12), ('smart', 0.12), ('boy', 0.118), ('wz', 0.116), ('propositional', 0.101), ('variable', 0.1), ('metric', 0.1), ('restarts', 0.092), ('tkl', 0.087), ('semantic', 0.084), ('xy', 0.083), ('initialisz', 0.066), ('irmanbdionmg', 0.066), ('vxw', 0.066), ('vyz', 0.066), ('wholesentence', 0.066), ('xvij', 0.066), ('mappings', 0.065), ('consensus', 0.063), ('ontonotes', 0.062), ('iff', 0.054), ('langkilde', 0.054), ('vij', 0.054), ('admiralty', 0.054), ('dridan', 0.054), ('nagarajan', 0.051), ('isomorphism', 0.051), ('rey', 0.051), ('overlap', 0.05), ('mapping', 0.049), ('kingsbury', 0.048), ('usc', 0.046), ('marina', 0.046), ('allen', 0.044), ('suite', 0.043), ('del', 0.043), ('subgraph', 0.042), ('pradhan', 0.042), ('structures', 0.04), ('parsing', 0.04), ('agreement', 0.04), ('tang', 0.038), ('snover', 0.038), ('integer', 0.037), ('annotators', 0.036), ('annotator', 0.035), ('wants', 0.035), ('programming', 0.035), ('zettlemoyer', 0.035), ('plus', 0.034), ('knight', 0.033), ('computing', 0.033), ('annotation', 0.033), ('match', 0.033), ('initialization', 0.032), ('evaluates', 0.031), ('compute', 0.031), ('jones', 0.03), ('correctness', 0.03), ('neighbor', 0.029), ('quadratic', 0.029), ('heuristic', 0.029), ('mesa', 0.029), ('dordrecht', 0.029), ('obtainable', 0.029), ('hillclimbing', 0.029), ('scores', 0.029), ('alignment', 0.029), ('maximum', 0.028), ('vari', 0.027), ('nual', 0.027), ('thoughtful', 0.027), ('parsons', 0.027), ('constructors', 0.027), ('exploits', 0.026), ('random', 0.026), ('degree', 0.026), ('logical', 0.026), ('palmer', 0.026), ('matching', 0.025), ('generalpurpose', 0.025), ('honor', 0.025), ('map', 0.025), ('bold', 0.025), ('assignment', 0.025), ('tr', 0.024), ('hyperedge', 0.024), ('runtimes', 0.024), ('exhaustively', 0.024), ('oepen', 0.024), ('matches', 0.024), ('mapped', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures
Author: Shu Cai ; Kevin Knight
Abstract: The evaluation of whole-sentence semantic structures plays an important role in semantic parsing and large-scale semantic structure annotation. However, there is no widely-used metric to evaluate wholesentence semantic structures. In this paper, we present smatch, a metric that calculates the degree of overlap between two semantic feature structures. We give an efficient algorithm to compute the metric and show the results of an inter-annotator agreement study.
2 0.078377672 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization
Author: Chen Li ; Xian Qian ; Yang Liu
Abstract: In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. For each bigram, a regression model is used to estimate its frequency in the reference summary. The regression model uses a variety ofindicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains. We demonstrate that our system consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.
3 0.059825335 228 acl-2013-Leveraging Domain-Independent Information in Semantic Parsing
Author: Dan Goldwasser ; Dan Roth
Abstract: Semantic parsing is a domain-dependent process by nature, as its output is defined over a set of domain symbols. Motivated by the observation that interpretation can be decomposed into domain-dependent and independent components, we suggest a novel interpretation model, which augments a domain dependent model with abstract information that can be shared by multiple domains. Our experiments show that this type of information is useful and can reduce the annotation effort significantly when moving between domains.
4 0.058459882 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities
Author: Ndapandula Nakashole ; Tomasz Tylenda ; Gerhard Weikum
Abstract: Methods for information extraction (IE) and knowledge base (KB) construction have been intensively studied. However, a largely under-explored case is tapping into highly dynamic sources like news streams and social media, where new entities are continuously emerging. In this paper, we present a method for discovering and semantically typing newly emerging out-ofKB entities, thus improving the freshness and recall of ontology-based IE and improving the precision and semantic rigor of open IE. Our method is based on a probabilistic model that feeds weights into integer linear programs that leverage type signatures of relational phrases and type correlation or disjointness constraints. Our experimental evaluation, based on crowdsourced user studies, show our method performing significantly better than prior work.
5 0.053873774 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars
Author: David Chiang ; Jacob Andreas ; Daniel Bauer ; Karl Moritz Hermann ; Bevan Jones ; Kevin Knight
Abstract: Hyperedge replacement grammar (HRG) is a formalism for generating and transforming graphs that has potential applications in natural language understanding and generation. A recognition algorithm due to Lautemann is known to be polynomial-time for graphs that are connected and of bounded degree. We present a more precise characterization of the algorithm’s complexity, an optimization analogous to binarization of contextfree grammars, and some important implementation details, resulting in an algorithm that is practical for natural-language applications. The algorithm is part of Bolinas, a new software toolkit for HRG processing.
6 0.051967062 242 acl-2013-Mining Equivalent Relations from Linked Data
7 0.049625073 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
8 0.049140282 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models
9 0.048195086 314 acl-2013-Semantic Roles for String to Tree Machine Translation
10 0.047911145 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts
11 0.045904309 195 acl-2013-Improving machine translation by training against an automatic semantic frame based evaluation metric
12 0.044757433 237 acl-2013-Margin-based Decomposed Amortized Inference
13 0.043525305 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner
14 0.0432676 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
15 0.04192353 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates
16 0.039746739 13 acl-2013-A New Syntactic Metric for Evaluation of Machine Translation
17 0.039407223 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
18 0.0392133 91 acl-2013-Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning
19 0.03895786 344 acl-2013-The Effects of Lexical Resource Quality on Preference Violation Detection
20 0.038844287 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
topicId topicWeight
[(0, 0.114), (1, -0.007), (2, -0.011), (3, -0.037), (4, -0.037), (5, 0.016), (6, 0.003), (7, -0.021), (8, -0.017), (9, -0.012), (10, -0.03), (11, 0.02), (12, -0.05), (13, -0.061), (14, -0.006), (15, -0.004), (16, -0.003), (17, 0.009), (18, 0.033), (19, 0.019), (20, 0.001), (21, 0.005), (22, -0.026), (23, 0.041), (24, 0.0), (25, 0.025), (26, -0.004), (27, 0.031), (28, -0.026), (29, 0.031), (30, 0.027), (31, -0.039), (32, -0.02), (33, 0.024), (34, 0.041), (35, -0.017), (36, -0.03), (37, 0.002), (38, 0.018), (39, 0.0), (40, 0.064), (41, 0.076), (42, 0.004), (43, 0.033), (44, -0.008), (45, 0.056), (46, -0.03), (47, -0.047), (48, -0.018), (49, -0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.86679685 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures
Author: Shu Cai ; Kevin Knight
Abstract: The evaluation of whole-sentence semantic structures plays an important role in semantic parsing and large-scale semantic structure annotation. However, there is no widely-used metric to evaluate wholesentence semantic structures. In this paper, we present smatch, a metric that calculates the degree of overlap between two semantic feature structures. We give an efficient algorithm to compute the metric and show the results of an inter-annotator agreement study.
2 0.60464269 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts
Author: Gerasimos Lampouras ; Ion Androutsopoulos
Abstract: We present an ILP model of concept-totext generation. Unlike pipeline architectures, our model jointly considers the choices in content selection, lexicalization, and aggregation to avoid greedy decisions and produce more compact texts.
3 0.58564609 161 acl-2013-Fluid Construction Grammar for Historical and Evolutionary Linguistics
Author: Pieter Wellens ; Remi van Trijp ; Katrien Beuls ; Luc Steels
Abstract: Fluid Construction Grammar (FCG) is an open-source computational grammar formalism that is becoming increasingly popular for studying the history and evolution of language. This demonstration shows how FCG can be used to operationalise the cultural processes and cognitive mechanisms that underly language evolution and change.
4 0.58044446 36 acl-2013-Adapting Discriminative Reranking to Grounded Language Learning
Author: Joohyun Kim ; Raymond Mooney
Abstract: We adapt discriminative reranking to improve the performance of grounded language acquisition, specifically the task of learning to follow navigation instructions from observation. Unlike conventional reranking used in syntactic and semantic parsing, gold-standard reference trees are not naturally available in a grounded setting. Instead, we show how the weak supervision of response feedback (e.g. successful task completion) can be used as an alternative, experimentally demonstrating that its performance is comparable to training on gold-standard parse trees.
5 0.57473779 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models
Author: Matthew R. Gormley ; Jason Eisner
Abstract: Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one might instead search for a global optimum in parameter space, using branch-and-bound. Our method would eventually find the global maximum (up to a user-specified ?) if run for long enough, but at any point can return a suboptimal solution together with an upper bound on the global maximum. As an illustrative case, we study a generative model for dependency parsing. We search for the maximum-likelihood model parameters and corpus parse, subject to posterior constraints. We show how to formulate this as a mixed integer quadratic programming problem with nonlinear constraints. We use the Reformulation Linearization Technique to produce convex relaxations during branch-and-bound. Although these techniques do not yet provide a practical solution to our instance of this NP-hard problem, they sometimes find better solutions than Viterbi EM with random restarts, in the same time.
6 0.55699676 1 acl-2013-"Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints
7 0.54590291 176 acl-2013-Grounded Unsupervised Semantic Parsing
8 0.53104848 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars
9 0.51835054 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
10 0.51286429 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features
11 0.51249373 228 acl-2013-Leveraging Domain-Independent Information in Semantic Parsing
12 0.50687999 311 acl-2013-Semantic Neighborhoods as Hypergraphs
13 0.50179917 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation
14 0.49088001 334 acl-2013-Supervised Model Learning with Feature Grouping based on a Discrete Constraint
15 0.48973471 21 acl-2013-A Statistical NLG Framework for Aggregated Planning and Realization
16 0.48878965 349 acl-2013-The mathematics of language learning
17 0.48361856 37 acl-2013-Adaptive Parser-Centric Text Normalization
18 0.47777212 367 acl-2013-Universal Conceptual Cognitive Annotation (UCCA)
19 0.47617781 364 acl-2013-Typesetting for Improved Readability using Lexical and Syntactic Information
20 0.47443956 13 acl-2013-A New Syntactic Metric for Evaluation of Machine Translation
topicId topicWeight
[(0, 0.047), (6, 0.044), (11, 0.052), (14, 0.015), (15, 0.02), (24, 0.024), (26, 0.029), (35, 0.072), (42, 0.042), (48, 0.055), (64, 0.015), (70, 0.069), (85, 0.314), (88, 0.014), (90, 0.013), (95, 0.084)]
simIndex simValue paperId paperTitle
same-paper 1 0.75188315 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures
Author: Shu Cai ; Kevin Knight
Abstract: The evaluation of whole-sentence semantic structures plays an important role in semantic parsing and large-scale semantic structure annotation. However, there is no widely-used metric to evaluate wholesentence semantic structures. In this paper, we present smatch, a metric that calculates the degree of overlap between two semantic feature structures. We give an efficient algorithm to compute the metric and show the results of an inter-annotator agreement study.
2 0.68531245 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers
Author: Yoav Goldberg ; Kai Zhao ; Liang Huang
Abstract: Beam search incremental parsers are accurate, but not as fast as they could be. We demonstrate that, contrary to popular belief, most current implementations of beam parsers in fact run in O(n2), rather than linear time, because each statetransition is actually implemented as an O(n) operation. We present an improved implementation, based on Tree Structured Stack (TSS), in which a transition is performed in O(1), resulting in a real lineartime algorithm, which is verified empiri- cally. We further improve parsing speed by sharing feature-extraction and dotproduct across beam items. Practically, our methods combined offer a speedup of ∼2x over strong baselines on Penn Treeb∼a2nxk sentences, a bnads are eosrd oenrs P eofn magnitude faster on much longer sentences.
3 0.64730161 130 acl-2013-Domain-Specific Coreference Resolution with Lexicalized Features
Author: Nathan Gilbert ; Ellen Riloff
Abstract: Most coreference resolvers rely heavily on string matching, syntactic properties, and semantic attributes of words, but they lack the ability to make decisions based on individual words. In this paper, we explore the benefits of lexicalized features in the setting of domain-specific coreference resolution. We show that adding lexicalized features to off-the-shelf coreference resolvers yields significant performance gains on four domain-specific data sets and with two types of coreference resolution architectures.
4 0.63892114 6 acl-2013-A Java Framework for Multilingual Definition and Hypernym Extraction
Author: Stefano Faralli ; Roberto Navigli
Abstract: In this paper we present a demonstration of a multilingual generalization of Word-Class Lattices (WCLs), a supervised lattice-based model used to identify textual definitions and extract hypernyms from them. Lattices are learned from a dataset of automatically-annotated definitions from Wikipedia. We release a Java API for the programmatic use of multilingual WCLs in three languages (English, French and Italian), as well as a Web application for definition and hypernym extraction from user-provided sentences.
5 0.56252557 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers
Author: Graham Neubig
Abstract: In this paper we describe Travatar, a forest-to-string machine translation (MT) engine based on tree transducers. It provides an open-source C++ implementation for the entire forest-to-string MT pipeline, including rule extraction, tuning, decoding, and evaluation. There are a number of options for model training, and tuning includes advanced options such as hypergraph MERT, and training of sparse features through online learning. The training pipeline is modeled after that of the popular Moses decoder, so users familiar with Moses should be able to get started quickly. We perform a validation experiment of the decoder on EnglishJapanese machine translation, and find that it is possible to achieve greater accuracy than translation using phrase-based and hierarchical-phrase-based translation. As auxiliary results, we also compare different syntactic parsers and alignment techniques that we tested in the process of developing the decoder. Travatar is available under the LGPL at http : / /phont ron . com/t ravat ar
7 0.46668738 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
8 0.4662075 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
9 0.46389362 275 acl-2013-Parsing with Compositional Vector Grammars
10 0.45956933 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
11 0.45935562 265 acl-2013-Outsourcing FrameNet to the Crowd
12 0.45905939 249 acl-2013-Models of Semantic Representation with Visual Attributes
13 0.45905578 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
14 0.45899108 212 acl-2013-Language-Independent Discriminative Parsing of Temporal Expressions
15 0.45867109 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
16 0.45849332 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
17 0.45798746 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
18 0.45741799 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
19 0.45740426 250 acl-2013-Models of Translation Competitions
20 0.45700309 175 acl-2013-Grounded Language Learning from Video Described with Sentences