emnlp emnlp2012 emnlp2012-99 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Vivek Srikumar ; Gourab Kundu ; Dan Roth
Abstract: This paper deals with the problem of predicting structures in the context of NLP. Typically, in structured prediction, an inference procedure is applied to each example independently of the others. In this paper, we seek to optimize the time complexity of inference over entire datasets, rather than individual examples. By considering the general inference representation provided by integer linear programs, we propose three exact inference theorems which allow us to re-use earlier solutions for certain instances, thereby completely avoiding possibly expensive calls to the inference procedure. We also identify several approximation schemes which can provide further speedup. We instantiate these ideas to the structured prediction task of semantic role labeling and show that we can achieve a speedup of over 2.5 using our approach while retaining the guarantees of exactness and a further speedup of over 3 using approximations that do not degrade performance.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract This paper deals with the problem of predicting structures in the context of NLP. [sent-3, score-0.193]
2 Typically, in structured prediction, an inference procedure is applied to each example independently of the others. [sent-4, score-0.433]
3 In this paper, we seek to optimize the time complexity of inference over entire datasets, rather than individual examples. [sent-5, score-0.374]
4 By considering the general inference representation provided by integer linear programs, we propose three exact inference theorems which allow us to re-use earlier solutions for certain instances, thereby completely avoiding possibly expensive calls to the inference procedure. [sent-6, score-1.535]
5 We instantiate these ideas to the structured prediction task of semantic role labeling and show that we can achieve a speedup of over 2. [sent-8, score-0.532]
6 5 using our approach while retaining the guarantees of exactness and a further speedup of over 3 using approximations that do not degrade performance. [sent-9, score-0.306]
7 1 Introduction Typically, in structured prediction applications, every example is treated independently and an inference algorithm is applied to each one of them. [sent-10, score-0.504]
8 , 2005) or its integer linear program variants (Riedel and Clarke, 2006; Martins et al. [sent-12, score-0.289]
9 1114 each sentence separately and runs the inference algorithm to predict the parse tree. [sent-15, score-0.341]
10 Thus, the time complexity of inference over the test set is linear in the size of the corpus. [sent-16, score-0.441]
11 In this paper, we ask the following question: For a given task, since the inference procedure predicts structures from the same family of structures (dependency trees, semantic role structures, etc. [sent-17, score-0.747]
12 ), can the fact that we are running inference for a large number of examples help us improve the time complexity of inference? [sent-18, score-0.381]
13 In the dependency parsing example, this question translates to asking whether, having parsed many sentences, we can decrease the parsing time for the next sentence. [sent-19, score-0.162]
14 Since any combinatorial optimization problem can be phrased as an integer linear program (ILP), we frame inference problems as ILPs for the purpose of analysis. [sent-20, score-0.847]
15 By analyzing the objective functions of integer linear programs, we identify conditions when two ILPs have the same solution. [sent-21, score-0.404]
16 This allows us to reuse solutions of previously solved problems and theoretically guarantee the optimality of the solution. [sent-22, score-0.379]
17 Furthermore, in some cases, even when the conditions are not satisfied, we can reuse previous solutions with high probability of being correct. [sent-23, score-0.206]
18 Given the extensive use of integer linear programs for structured prediction in Natural Language Processing over the last few years, these ideas can be applied broadly to NLP problems. [sent-24, score-0.614]
19 We instantiate our improved inference approaches in the structured prediction task of semantic role labeling, where we use an existing implementation and a previous trained model that is based on the approach of (Punyakanok et al. [sent-25, score-0.666]
20 We merely modify the inference pro- LParnogcue agdein Lgesa ornf tihneg, 2 p0a1g2e Jso 1in1 t C4–o1n1f2e4re,n Jce ju on Is Elanmdp,ir Kicoarlea M,e 1t2h–o1d4s J iunly N 2a0tu1r2a. [sent-27, score-0.341]
21 451d408up Table 1: The speedup for semantic role labeling corresponding to the three theorems described in this paper. [sent-31, score-0.458]
22 These theorems guarantee the optimality of the solution, thus ensuring that the speedup is not accompanied by any loss in performance. [sent-32, score-0.408]
23 Allowing small violations to the conditions of the theorems provide an even higher improvement in speedup (over 3), without loss of performance. [sent-35, score-0.399]
24 We pose the problem of optimizing inference costs over entire datasets rather than individual examples. [sent-37, score-0.436]
25 We identify equivalence classes of ILP problems and use this notion to prove exact conditions under which no inference is required. [sent-40, score-0.683]
26 These conditions lead to algorithms that can speed up inference problem without losing the exactness guarantees. [sent-41, score-0.558]
27 We also use these conditions to develop approximate inference algorithms that can provide a further speedup. [sent-42, score-0.415]
28 We apply our approach to the structured prediction task of semantic role labeling. [sent-44, score-0.261]
29 By not having to perform inference on some of the instances, those that are equivalent to previously seen instances, we show significant speed up in terms of the number of times inference needs to be performed. [sent-45, score-0.719]
30 The rest of this paper is organized as follows: In section 2, we formulate the problem of amortized inference and provide motivation for why amortized 1115 gains can be possible. [sent-47, score-1.084]
31 This leads to the theoretical discussion in section 3, where we present the metaalgorithm for amortized inference along with several exact and approximate inference schemes. [sent-48, score-1.037]
32 We instantiate these schemes for the task of semantic role labeling (Section 4). [sent-49, score-0.295]
33 2 Motivation Many NLP tasks can be phrased as structured prediction problems, where the goal is to jointly assign values to many inference variables while accounting for possible dependencies among them. [sent-51, score-0.637]
34 In general, the inference problem can be formulated and solved as integer linear programs (ILPs). [sent-53, score-0.809]
35 Following (Roth and Yih, 2004) Integer linear programs have been used broadly in NLP. [sent-54, score-0.292]
36 , 2008) dealt with semantic role labeling with this technique. [sent-58, score-0.167]
37 The standard approach to framing dependency parsing as an integer linear program was introduced by (Riedel and Clarke, 2006), who converted the MST parser of (McDonald et al. [sent-60, score-0.399]
38 The goal of inference is to select the maximum spanning tree of this weighted graph. [sent-63, score-0.382]
39 1 Problem Formulation In this work, we consider the general inference problem of solving a 0-1 integer linear program. [sent-65, score-0.601]
40 Since structured prediction assigns values to a collection of inter-related binary decisions, we denote the ith binary decision by yi ∈ {0, 1} and the entire structure as y, the vector composed }of a naldl tthhee binary decisions. [sent-70, score-0.351]
41 In our running example, each edge in the weighted graph generates a single decision variable (for unlabeled dependency parsing). [sent-71, score-0.171]
42 We denote t∈he < e dnetinreo eco tlhleec wtieoing hotf a weights by tihteh vector c, forming the objective for this ILP. [sent-73, score-0.188]
43 Without loss of generality, these constraints can be expressed using linear inequalities over the inference variables, which we write as MTy ≤ b for a real valued matrix M and a vector b. [sent-75, score-0.547]
44 Now, the overall goal of inference is to find the highest scoring structure. [sent-77, score-0.341]
45 Thus, we can frame infer- < ence as an optimization problem p with n inference variables as follows: ayr∈g{0m,1}anx cTy (1) subject to MTy ≤ b. [sent-78, score-0.456]
46 (2) For brevity, we denote the space offeasible solutions that satisfy the constraints for the ILP problem p as Kp = {y ∈ {0, 1}n|MTy ≤ b}. [sent-79, score-0.202]
47 We refer to Kp as the feasible set for the inference problem p and yp as its solution. [sent-81, score-0.478]
48 In the worst case, integer linear programs are known to be NP-hard. [sent-82, score-0.417]
49 For structured prediction problems seen in NLP, we typically solve many instances of inference problems. [sent-84, score-0.65]
50 In this paper, we investigate whether an inference algorithm can use previous predictions to speed up inference time, thus giving us an amortized gain 1116 in inference time over the lifetime of the program. [sent-85, score-1.521]
51 We refer to inference algorithms that have this capability as amortized inference algorithms. [sent-86, score-1.037]
52 Over the lifetime of the dependency parser, we create one inference instance (that is, one ILP) per sentence and solve it. [sent-88, score-0.505]
53 An amortized inference algorithm becomes faster at parsing as it parses more and more sentences. [sent-89, score-0.748]
54 2 Why can inference costs be amortized over datasets? [sent-91, score-0.758]
55 In the rest of this section, we will argue that the time cost of inference can be amortized because of the nature of inference in NLP tasks. [sent-92, score-1.037]
56 Our argument is based on two observations, which are summarized in Figure (1): (1) Though the space of possible structures may be large, only a very small fraction of these occur. [sent-93, score-0.203]
57 (2) The distribution of observed structures is heavily skewed towards a small number of them. [sent-94, score-0.154]
58 Exa m’splesfIoLrPmulationILp’Ps Infer nceSolyut’isons Figure 1: For a structured prediction task, the inference problem p for an example x needs to be formulated before solving it to get the structure y. [sent-95, score-0.54]
59 In structured prediction problems seen in NLP, while an exponential number of structures is possible for a given instance, in practice, only a small fraction of these ever occur. [sent-96, score-0.391]
60 This figure illustrates the empirical observation that there are fewer inference problems p’s than the number of examples and the number of observed structures y’s is even lesser. [sent-97, score-0.606]
61 Thus, for a sentence of size n, we could have 45n of which the inference process needs to choose one. [sent-100, score-0.376]
62 ) Note that the number of instances is not the number of unique examples of a given length, but the number of times an inference procedure is called for an input of a given size. [sent-104, score-0.446]
63 These plots show that for sentences of a given length, most structures (solutions) that are possible never occur, or occur very infrequently; only a few of the possible structures (solutions) actually occur frequently. [sent-107, score-0.344]
64 We see that the number of structures is much fewer than the number of instances for any sentence size. [sent-113, score-0.263]
65 The figures (2b) and (2c) show similar statistics for unlabeled dependency parsing and semantic role labeling. [sent-115, score-0.249]
66 In both cases, we see that the number of empirically observed structures is far fewer than the number of instances to be labeled. [sent-117, score-0.263]
67 Thus, for any given input size, the number of instances of that size (over the lifetime of the program) far exceeds the number of observed structures for that size. [sent-118, score-0.367]
68 Moreover, the number of observed structures is significantly smaller than the number of theoretically possible structures. [sent-119, score-0.193]
69 Thus, we have a small number of structures that form optimum structures for many inference instances of the same size. [sent-120, score-0.762]
70 Our second observation deals with the distribution of structures for a given input size. [sent-121, score-0.193]
71 In all cases, we see that a few structures are most frequent. [sent-123, score-0.154]
72 We observed similar distributions of structures for all input sizes for dependency parsing and semantic role labeling as well. [sent-124, score-0.431]
73 Since the number of structures for a given example size is small, many examples x’s, and hence many inference problems p’s, are associated with the same structure y. [sent-125, score-0.604]
74 These observations suggest the possibility of getting an amortized gain in inference time by characterizing the set of inference problems that produce the same structure. [sent-126, score-1.154]
75 Then, for a new inference problem, if we can identify that it belongs to a known set, that is, will yield a solution that we have already seen, we do not have to run inference at all. [sent-127, score-0.783]
76 The second observation also suggests that this characterization of sets of problems that have the same solution can be done in a data-driven way because characterizing a small number of structures can give us high coverage. [sent-128, score-0.372]
77 3 Amortizing inference costs In this section, we present different schemes for amortized inference leading up to an inference metaalgorithm. [sent-129, score-1.504]
78 The meta-algorithm is both agnostic to the underlying inference algorithm that is used by the problem and maintains the exactness properties of the underlying inference scheme. [sent-130, score-0.836]
79 That is, if we have an exact/approximate inference algorithm with a certain guarantees, the meta-algorithm will have the same guarantees, but with a speedup. [sent-131, score-0.341]
80 1 Notation For an integer linear program p with np variables, we denote its objective coefficients by cp and its feasible set by Kp. [sent-133, score-0.716]
81 We consider many instantiations of the inference problem and use superscripts to denote each individual instance. [sent-136, score-0.423]
82 Thus, we have a large collection of inference instances P = {p1, p2, · · · } along with ithnfeeirr respective scoelsut Pion =s {y1p, yp2, · · · }. [sent-137, score-0.413]
83 Two in- teger linear programs are said to be in the same equivalence class if they have the same number of 1118 inference variables and the same feasible set. [sent-139, score-0.929]
84 If [P] is an equivalence class of ILPs, we use the notation K[P] to denote its feasible set and n[P] to denote the number of variables. [sent-141, score-0.459]
85 Also, for a program p, we use the notation p ∼ [P] to indicate that it belongs stoe tthhee equivalence ∼clas [sP [P] . [sent-142, score-0.337]
86 2 Exact theorems Our goal is to characterize the set of objective functions which will have the same solution for a given equivalence class of problems. [sent-144, score-0.587]
87 For every inference variable that is active in the solution (i. [sent-146, score-0.442]
88 Similarly, for all other variables (whose value in the solution is 0), decreasing the objective value will not change the optimal solution. [sent-149, score-0.279]
89 This intuition gives us our first theorem for checking whether two ILPs have the same solution by looking at the difference between their objective coefficients. [sent-150, score-0.305]
90 Let p denote an inference problem posed as an integer linear program belonging to an equivalence class [P]. [sent-152, score-0.906]
91 Let q ∼ [P] be another inference ianlestnacnec ec ains sth [eP same equivalence cnloatshs. [sent-153, score-0.576]
92 e Define δc = cq cp to be the difference of the objective coefficients of the ILPs. [sent-154, score-0.342]
93 Under these conditions, if yp is the maximizer of the original objective, then it maximizes the new objective too. [sent-156, score-0.179]
94 Theorem 1identifies perturbations of an ILP’s objective coefficients that will not change the optimal assignment. [sent-157, score-0.209]
95 Next, we will characterize the sets of objective values that will have the same solution using a criterion that is independent of the actual solution. [sent-158, score-0.24]
96 Suppose we have two ILPs p and q in an equivalence class [P] whose objective values are cp and cq respectively. [sent-159, score-0.433]
97 Multiplying these inequalities by any two positive real numbers x1 and x2 and adding them shows us that y∗ is also the solution for the ILP in [P] which has the objective coefficients x1cp + x2cq. [sent-162, score-0.381]
98 Extending this to an arbitrary number of inference problems gives us our next theorem. [sent-163, score-0.415]
99 Let P denote a collection {p1, p2, · · · , pm} of m inference problems in {thpe same equivalence c mlass i [P] nacned suppose sth iant all the problems have the same solution, yp. [sent-165, score-0.86]
100 Let q ∼ [P] be a new inference program whose optimal sqol ∼uti [oPn] bise y. [sent-166, score-0.406]
wordName wordTfidf (topN-words)
[('amortized', 0.355), ('inference', 0.341), ('ilp', 0.201), ('equivalence', 0.194), ('ilps', 0.193), ('programs', 0.193), ('integer', 0.159), ('structures', 0.154), ('theorems', 0.153), ('otkens', 0.142), ('speedup', 0.138), ('atgged', 0.106), ('exactness', 0.106), ('ferquency', 0.106), ('ggiaword', 0.106), ('lifetime', 0.106), ('punyakanok', 0.106), ('satitstcis', 0.106), ('senetnces', 0.106), ('soluiotns', 0.106), ('wtih', 0.106), ('objective', 0.106), ('coefficients', 0.103), ('solution', 0.101), ('theorem', 0.098), ('structured', 0.092), ('mty', 0.092), ('solutions', 0.084), ('clarke', 0.082), ('denote', 0.082), ('problems', 0.074), ('conditions', 0.074), ('yp', 0.073), ('instances', 0.072), ('cp', 0.072), ('variables', 0.072), ('prediction', 0.071), ('amortizing', 0.071), ('cpty', 0.071), ('cqty', 0.071), ('inequalities', 0.071), ('soluiotn', 0.071), ('szie', 0.071), ('labeling', 0.069), ('role', 0.065), ('linear', 0.065), ('program', 0.065), ('instantiate', 0.064), ('schemes', 0.064), ('feasible', 0.064), ('yih', 0.062), ('costs', 0.062), ('guarantees', 0.062), ('ofr', 0.061), ('phrased', 0.061), ('cq', 0.061), ('dependency', 0.058), ('roth', 0.057), ('suppose', 0.054), ('parsing', 0.052), ('calls', 0.051), ('kp', 0.051), ('solved', 0.051), ('argument', 0.049), ('reuse', 0.048), ('optimality', 0.048), ('agnostic', 0.048), ('frame', 0.043), ('characterizing', 0.043), ('sth', 0.041), ('tthhee', 0.041), ('optimum', 0.041), ('spanning', 0.041), ('unlabeled', 0.041), ('riedel', 0.04), ('running', 0.04), ('deals', 0.039), ('theoretically', 0.039), ('combinatorial', 0.039), ('martins', 0.038), ('fewer', 0.037), ('notation', 0.037), ('speed', 0.037), ('tag', 0.037), ('plots', 0.036), ('solving', 0.036), ('constraints', 0.036), ('guarantee', 0.035), ('size', 0.035), ('log', 0.034), ('broadly', 0.034), ('loss', 0.034), ('entire', 0.033), ('unique', 0.033), ('gains', 0.033), ('characterize', 0.033), ('semantic', 0.033), ('decision', 0.032), ('sequences', 0.032), ('nlp', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 99 emnlp-2012-On Amortizing Inference Cost for Structured Prediction
Author: Vivek Srikumar ; Gourab Kundu ; Dan Roth
Abstract: This paper deals with the problem of predicting structures in the context of NLP. Typically, in structured prediction, an inference procedure is applied to each example independently of the others. In this paper, we seek to optimize the time complexity of inference over entire datasets, rather than individual examples. By considering the general inference representation provided by integer linear programs, we propose three exact inference theorems which allow us to re-use earlier solutions for certain instances, thereby completely avoiding possibly expensive calls to the inference procedure. We also identify several approximation schemes which can provide further speedup. We instantiate these ideas to the structured prediction task of semantic role labeling and show that we can achieve a speedup of over 2.5 using our approach while retaining the guarantees of exactness and a further speedup of over 3 using approximations that do not degrade performance.
2 0.12664969 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers
Author: Sebastian Riedel ; David Smith ; Andrew McCallum
Abstract: Graph-based dependency parsers suffer from the sheer number of higher order edges they need to (a) score and (b) consider during optimization. Here we show that when working with LP relaxations, large fractions of these edges can be pruned before they are fully scored—without any loss of optimality guarantees and, hence, accuracy. This is achieved by iteratively parsing with a subset of higherorder edges, adding higher-order edges that may improve the score of the current solution, and adding higher-order edges that are implied by the current best first order edges. This amounts to delayed column and row generation in the LP relaxation and is guaranteed to provide the optimal LP solution. For second order grandparent models, our method considers, or scores, no more than 6–13% of the second order edges of the full model. This yields up to an eightfold parsing speedup, while pro- viding the same empirical accuracy and certificates of optimality as working with the full LP relaxation. We also provide a tighter LP formulation for grandparent models that leads to a smaller integrality gap and higher speed.
3 0.11062826 94 emnlp-2012-Multiple Aspect Summarization Using Integer Linear Programming
Author: Kristian Woodsend ; Mirella Lapata
Abstract: Multi-document summarization involves many aspects of content selection and surface realization. The summaries must be informative, succinct, grammatical, and obey stylistic writing conventions. We present a method where such individual aspects are learned separately from data (without any hand-engineering) but optimized jointly using an integer linear programme. The ILP framework allows us to combine the decisions of the expert learners and to select and rewrite source content through a mixture of objective setting, soft and hard constraints. Experimental results on the TAC-08 data set show that our model achieves state-of-the-art performance using ROUGE and significantly improves the informativeness of the summaries.
4 0.10319725 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
Author: Alexander Rush ; Roi Reichart ; Michael Collins ; Amir Globerson
Abstract: State-of-the-art statistical parsers and POS taggers perform very well when trained with large amounts of in-domain data. When training data is out-of-domain or limited, accuracy degrades. In this paper, we aim to compensate for the lack of available training data by exploiting similarities between test set sentences. We show how to augment sentencelevel models for parsing and POS tagging with inter-sentence consistency constraints. To deal with the resulting global objective, we present an efficient and exact dual decomposition decoding algorithm. In experiments, we add consistency constraints to the MST parser and the Stanford part-of-speech tagger and demonstrate significant error reduction in the domain adaptation and the lightly supervised settings across five languages.
5 0.088419639 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
Author: Hao Zhang ; Ryan McDonald
Abstract: State-of-the-art graph-based parsers use features over higher-order dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. In this paper, we attempt to address this imbalance for graph-based parsing by generalizing the Eisner (1996) algorithm to handle arbitrary features over higherorder dependencies. The generalization is at the cost of asymptotic efficiency. To account for this, cube pruning for decoding is utilized (Chiang, 2007). For the first time, label tuple and structural features such as valencies can be scored efficiently with third-order features in a graph-based parser. Our parser achieves the state-of-art unlabeled accuracy of 93.06% and labeled accuracy of 91.86% on the standard test set for English, at a faster speed than a reimplementation ofthe third-ordermodel of Koo et al. (2010).
6 0.078932323 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling
7 0.077666894 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
8 0.072702646 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing
9 0.06930314 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure
10 0.061614599 81 emnlp-2012-Learning to Map into a Universal POS Tagset
11 0.05952166 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
12 0.05891262 97 emnlp-2012-Natural Language Questions for the Web of Data
13 0.053450529 72 emnlp-2012-Joint Inference for Event Timeline Construction
14 0.052126791 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation
15 0.048800774 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
16 0.047932915 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
17 0.045414343 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation
18 0.044949625 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
19 0.044366341 56 emnlp-2012-Framework of Automatic Text Summarization Using Reinforcement Learning
20 0.043013096 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage
topicId topicWeight
[(0, 0.185), (1, -0.034), (2, 0.087), (3, -0.055), (4, 0.015), (5, 0.014), (6, -0.018), (7, 0.01), (8, 0.042), (9, 0.134), (10, 0.079), (11, 0.072), (12, 0.022), (13, -0.135), (14, -0.04), (15, 0.019), (16, -0.062), (17, 0.042), (18, 0.168), (19, -0.054), (20, 0.03), (21, 0.237), (22, 0.024), (23, -0.186), (24, -0.001), (25, 0.008), (26, 0.05), (27, -0.1), (28, 0.02), (29, 0.12), (30, -0.097), (31, 0.018), (32, -0.014), (33, 0.057), (34, 0.09), (35, 0.195), (36, -0.017), (37, -0.007), (38, -0.004), (39, 0.161), (40, 0.052), (41, -0.059), (42, -0.087), (43, -0.037), (44, 0.044), (45, 0.115), (46, 0.004), (47, -0.171), (48, 0.126), (49, -0.153)]
simIndex simValue paperId paperTitle
same-paper 1 0.97761571 99 emnlp-2012-On Amortizing Inference Cost for Structured Prediction
Author: Vivek Srikumar ; Gourab Kundu ; Dan Roth
Abstract: This paper deals with the problem of predicting structures in the context of NLP. Typically, in structured prediction, an inference procedure is applied to each example independently of the others. In this paper, we seek to optimize the time complexity of inference over entire datasets, rather than individual examples. By considering the general inference representation provided by integer linear programs, we propose three exact inference theorems which allow us to re-use earlier solutions for certain instances, thereby completely avoiding possibly expensive calls to the inference procedure. We also identify several approximation schemes which can provide further speedup. We instantiate these ideas to the structured prediction task of semantic role labeling and show that we can achieve a speedup of over 2.5 using our approach while retaining the guarantees of exactness and a further speedup of over 3 using approximations that do not degrade performance.
2 0.7244491 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers
Author: Sebastian Riedel ; David Smith ; Andrew McCallum
Abstract: Graph-based dependency parsers suffer from the sheer number of higher order edges they need to (a) score and (b) consider during optimization. Here we show that when working with LP relaxations, large fractions of these edges can be pruned before they are fully scored—without any loss of optimality guarantees and, hence, accuracy. This is achieved by iteratively parsing with a subset of higherorder edges, adding higher-order edges that may improve the score of the current solution, and adding higher-order edges that are implied by the current best first order edges. This amounts to delayed column and row generation in the LP relaxation and is guaranteed to provide the optimal LP solution. For second order grandparent models, our method considers, or scores, no more than 6–13% of the second order edges of the full model. This yields up to an eightfold parsing speedup, while pro- viding the same empirical accuracy and certificates of optimality as working with the full LP relaxation. We also provide a tighter LP formulation for grandparent models that leads to a smaller integrality gap and higher speed.
3 0.58367318 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
Author: Alexander Rush ; Roi Reichart ; Michael Collins ; Amir Globerson
Abstract: State-of-the-art statistical parsers and POS taggers perform very well when trained with large amounts of in-domain data. When training data is out-of-domain or limited, accuracy degrades. In this paper, we aim to compensate for the lack of available training data by exploiting similarities between test set sentences. We show how to augment sentencelevel models for parsing and POS tagging with inter-sentence consistency constraints. To deal with the resulting global objective, we present an efficient and exact dual decomposition decoding algorithm. In experiments, we add consistency constraints to the MST parser and the Stanford part-of-speech tagger and demonstrate significant error reduction in the domain adaptation and the lightly supervised settings across five languages.
4 0.45216668 94 emnlp-2012-Multiple Aspect Summarization Using Integer Linear Programming
Author: Kristian Woodsend ; Mirella Lapata
Abstract: Multi-document summarization involves many aspects of content selection and surface realization. The summaries must be informative, succinct, grammatical, and obey stylistic writing conventions. We present a method where such individual aspects are learned separately from data (without any hand-engineering) but optimized jointly using an integer linear programme. The ILP framework allows us to combine the decisions of the expert learners and to select and rewrite source content through a mixture of objective setting, soft and hard constraints. Experimental results on the TAC-08 data set show that our model achieves state-of-the-art performance using ROUGE and significantly improves the informativeness of the summaries.
5 0.39605507 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
Author: Hao Zhang ; Ryan McDonald
Abstract: State-of-the-art graph-based parsers use features over higher-order dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. In this paper, we attempt to address this imbalance for graph-based parsing by generalizing the Eisner (1996) algorithm to handle arbitrary features over higherorder dependencies. The generalization is at the cost of asymptotic efficiency. To account for this, cube pruning for decoding is utilized (Chiang, 2007). For the first time, label tuple and structural features such as valencies can be scored efficiently with third-order features in a graph-based parser. Our parser achieves the state-of-art unlabeled accuracy of 93.06% and labeled accuracy of 91.86% on the standard test set for English, at a faster speed than a reimplementation ofthe third-ordermodel of Koo et al. (2010).
6 0.34481466 81 emnlp-2012-Learning to Map into a Universal POS Tagset
7 0.32528627 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling
8 0.32267702 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure
9 0.29541734 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
10 0.264909 118 emnlp-2012-Source Language Adaptation for Resource-Poor Machine Translation
11 0.26090562 56 emnlp-2012-Framework of Automatic Text Summarization Using Reinforcement Learning
12 0.25670931 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
13 0.2484463 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
14 0.24361555 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing
15 0.24084647 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation
16 0.23140728 97 emnlp-2012-Natural Language Questions for the Web of Data
17 0.22307272 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage
18 0.21937865 88 emnlp-2012-Minimal Dependency Length in Realization Ranking
19 0.21555932 61 emnlp-2012-Grounded Models of Semantic Representation
20 0.21446304 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars
topicId topicWeight
[(2, 0.022), (14, 0.012), (16, 0.038), (25, 0.028), (34, 0.054), (60, 0.041), (63, 0.045), (64, 0.013), (65, 0.039), (70, 0.01), (74, 0.023), (76, 0.04), (80, 0.554), (86, 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.96174794 99 emnlp-2012-On Amortizing Inference Cost for Structured Prediction
Author: Vivek Srikumar ; Gourab Kundu ; Dan Roth
Abstract: This paper deals with the problem of predicting structures in the context of NLP. Typically, in structured prediction, an inference procedure is applied to each example independently of the others. In this paper, we seek to optimize the time complexity of inference over entire datasets, rather than individual examples. By considering the general inference representation provided by integer linear programs, we propose three exact inference theorems which allow us to re-use earlier solutions for certain instances, thereby completely avoiding possibly expensive calls to the inference procedure. We also identify several approximation schemes which can provide further speedup. We instantiate these ideas to the structured prediction task of semantic role labeling and show that we can achieve a speedup of over 2.5 using our approach while retaining the guarantees of exactness and a further speedup of over 3 using approximations that do not degrade performance.
2 0.66152561 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
Author: Michael Roth ; Anette Frank
Abstract: Generating coherent discourse is an important aspect in natural language generation. Our aim is to learn factors that constitute coherent discourse from data, with a focus on how to realize predicate-argument structures in a model that exceeds the sentence level. We present an important subtask for this overall goal, in which we align predicates across comparable texts, admitting partial argument structure correspondence. The contribution of this work is two-fold: We first construct a large corpus resource of comparable texts, including an evaluation set with manual predicate alignments. Secondly, we present a novel approach for aligning predicates across comparable texts using graph-based clustering with Mincuts. Our method significantly outperforms other alignment techniques when applied to this novel alignment task, by a margin of at least 6.5 percentage points in F1-score.
3 0.35546932 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents
Author: Heeyoung Lee ; Marta Recasens ; Angel Chang ; Mihai Surdeanu ; Dan Jurafsky
Abstract: We introduce a novel coreference resolution system that models entities and events jointly. Our iterative method cautiously constructs clusters of entity and event mentions using linear regression to model cluster merge operations. As clusters are built, information flows between entity and event clusters through features that model semantic role dependencies. Our system handles nominal and verbal events as well as entities, and our joint formulation allows information from event coreference to help entity coreference, and vice versa. In a cross-document domain with comparable documents, joint coreference resolution performs significantly better (over 3 CoNLL F1 points) than two strong baselines that resolve entities and events separately.
4 0.35113087 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
Author: Alexander Rush ; Roi Reichart ; Michael Collins ; Amir Globerson
Abstract: State-of-the-art statistical parsers and POS taggers perform very well when trained with large amounts of in-domain data. When training data is out-of-domain or limited, accuracy degrades. In this paper, we aim to compensate for the lack of available training data by exploiting similarities between test set sentences. We show how to augment sentencelevel models for parsing and POS tagging with inter-sentence consistency constraints. To deal with the resulting global objective, we present an efficient and exact dual decomposition decoding algorithm. In experiments, we add consistency constraints to the MST parser and the Stanford part-of-speech tagger and demonstrate significant error reduction in the domain adaptation and the lightly supervised settings across five languages.
5 0.32954791 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers
Author: Sebastian Riedel ; David Smith ; Andrew McCallum
Abstract: Graph-based dependency parsers suffer from the sheer number of higher order edges they need to (a) score and (b) consider during optimization. Here we show that when working with LP relaxations, large fractions of these edges can be pruned before they are fully scored—without any loss of optimality guarantees and, hence, accuracy. This is achieved by iteratively parsing with a subset of higherorder edges, adding higher-order edges that may improve the score of the current solution, and adding higher-order edges that are implied by the current best first order edges. This amounts to delayed column and row generation in the LP relaxation and is guaranteed to provide the optimal LP solution. For second order grandparent models, our method considers, or scores, no more than 6–13% of the second order edges of the full model. This yields up to an eightfold parsing speedup, while pro- viding the same empirical accuracy and certificates of optimality as working with the full LP relaxation. We also provide a tighter LP formulation for grandparent models that leads to a smaller integrality gap and higher speed.
6 0.32291549 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections
7 0.31366917 73 emnlp-2012-Joint Learning for Coreference Resolution with Markov Logic
8 0.31303328 72 emnlp-2012-Joint Inference for Event Timeline Construction
9 0.31140047 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
10 0.30268055 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling
11 0.30025345 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing
12 0.29692736 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
13 0.2810955 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
14 0.2801733 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes
15 0.27801463 81 emnlp-2012-Learning to Map into a Universal POS Tagset
16 0.27707657 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction
17 0.27562639 75 emnlp-2012-Large Scale Decipherment for Out-of-Domain Machine Translation
18 0.2740247 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
19 0.27309337 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
20 0.27294543 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories